home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where issue = 1855885427 and "updated_at" is on date 2023-08-18 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • simonw 8
  • asg017 3

author_association 2

  • OWNER 8
  • CONTRIBUTOR 3

issue 1

  • De-tangling Metadata before Datasette 1.0 · 11 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1684496274 https://github.com/simonw/datasette/issues/2143#issuecomment-1684496274 https://api.github.com/repos/simonw/datasette/issues/2143 IC_kwDOBm6k_c5kZ1-S asg017 15178711 2023-08-18T22:30:45Z 2023-08-18T22:30:45Z CONTRIBUTOR

That said, I do really like a bias towards settings that can be changed at runtime

Does this include things like --settings values or plugin config? I can totally see being able to update metadata without restarting, but not sure if that would work well with --setting, plugin config, or auth/permissions stuff.

Well it could work with --setting and auth/permissions, with a lot of core changes. But changing plugin config on the fly could be challenging, for plugin authors.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
De-tangling Metadata before Datasette 1.0 1855885427  
1684488526 https://github.com/simonw/datasette/issues/2143#issuecomment-1684488526 https://api.github.com/repos/simonw/datasette/issues/2143 IC_kwDOBm6k_c5kZ0FO simonw 9599 2023-08-18T22:18:39Z 2023-08-18T22:18:39Z OWNER

Another option would be, instead of flat datasette.json/datasette.yaml files, we could instead use a Python file, like datasette_config.py. That way one could dynamically generate config (ex dev vs prod, auto-discover credentials, etc.). Kinda like Django settings.

Another option would be, instead of flat datasette.json/datasette.yaml files, we could instead use a Python file, like datasette_config.py. That way one could dynamically generate config (ex dev vs prod, auto-discover credentials, etc.). Kinda like Django settings.

I'm not a fan of that. I feel like software history is full of examples of projects that implemented configuration-as-code and then later regretted it - the most recent example is setup.py in Python turning into pyproject.yaml, but I feel like I've seen that pattern play out elsewhere too.

I don't think having people dynamically generate JSON/YAML for their configuration is a big burden. I'd have to see some very compelling use-cases to convince me otherwise.

That said, I do really like a bias towards settings that can be changed at runtime. Datasette has suffered a bit from some settings that can't be easily changed at runtime already - hence my gnarly https://github.com/simonw/datasette-remote-metadata plugin.

For things like Datasette Cloud for example the more people can configure without rebooting their container the better!

I don't think live reconfiguration at runtime is incompatible with JSON/YAML configuration though. Caddy is one of my favourite examples of software that can be entirely re-configured at runtime by POSTING a big blob of JSON to it: https://caddyserver.com/docs/quick-starts/api

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
De-tangling Metadata before Datasette 1.0 1855885427  
1684485591 https://github.com/simonw/datasette/issues/2143#issuecomment-1684485591 https://api.github.com/repos/simonw/datasette/issues/2143 IC_kwDOBm6k_c5kZzXX simonw 9599 2023-08-18T22:14:35Z 2023-08-18T22:14:35Z OWNER

Actually there is one thing that I'm not comfortable about with respect to the existing design: the way the database / tables stuff is nested.

They assume that the user will attach the database to Datasette using a fixed name - docs.db or whatever.

But what if we want to support users downloading databases from each other and attaching them to Datasette where those DBs might carry some of their own configuration?

Moving metadata into the databases makes sense there, but what about database-specific settings like the default sort order for a table, or configured canned queries?

Having those tied to the filename of the database itself feels unpleasant to me. But how else could we handle this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
De-tangling Metadata before Datasette 1.0 1855885427  
1684484426 https://github.com/simonw/datasette/issues/2143#issuecomment-1684484426 https://api.github.com/repos/simonw/datasette/issues/2143 IC_kwDOBm6k_c5kZzFK simonw 9599 2023-08-18T22:12:52Z 2023-08-18T22:12:52Z OWNER

Yeah, I'm convinced by that. There's not point in having both settings.json and datasette.json.

I like datasette.json ( / datasette.yml) as a name. That can be the file that lives in your config directory too, so if you run datasette . in a folder containing datasette.yml all of those settings get picked up.

Here's a thought for how it could look - I'll go with the YAML format because I expect that to be the default most people use, just because it supports multi-line strings better.

I based this on the big example at https://docs.datasette.io/en/1.0a3/metadata.html#using-yaml-for-metadata - and combined some bits from https://docs.datasette.io/en/1.0a3/authentication.html as well.

```yaml title: Demonstrating Metadata from YAML description_html: |-

This description includes a long HTML string

  • YAML is better for embedding HTML strings than JSON!

settings: default_page_size: 10 max_returned_rows: 3000 sql_time_limit_ms": 8000

databases: docs: permissions: create-table: id: editor fixtures: tables: no_primary_key: hidden: true queries: neighborhood_search: sql: |- select neighborhood, facet_cities.name, state from facetable join facet_cities on facetable.city_id = facet_cities.id where neighborhood like '%' || :text || '%' order by neighborhood; title: Search neighborhoods description_html: |-

This demonstrates basic LIKE search

permissions: debug-menu: id: '*'

plugins: datasette-ripgrep: path: /usr/local/lib/python3.11/site-packages `` I'm inclined to say we try to be a super-set of the existingmetadata.yml` format, at least where it makes sense to do so. That way the upgrade path is smooth for people. Also, I don't think the format itself is terrible - it's the name that's the big problem.

In this example I've mixed in one extra concept: that settings: block with a bunch of settings in it.

There are some things in there that look a little bit like metadata - the title and description_html fields.

But are they metadata? The title and description of the overall instance feels like it could be described as general configuration. The stuff for the query should live where the query itself is defined.

Note that queries can be defined by a plugin hook too: https://docs.datasette.io/en/1.0a3/plugin_hooks.html#canned-queries-datasette-database-actor

What do you think? Is this the right direction, or are you thinking there's a more radical redesign that would make sense here?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
De-tangling Metadata before Datasette 1.0 1855885427  
1684205563 https://github.com/simonw/datasette/issues/2143#issuecomment-1684205563 https://api.github.com/repos/simonw/datasette/issues/2143 IC_kwDOBm6k_c5kYu_7 asg017 15178711 2023-08-18T17:12:54Z 2023-08-18T17:12:54Z CONTRIBUTOR

Another option would be, instead of flat datasette.json/datasette.yaml files, we could instead use a Python file, like datasette_config.py. That way one could dynamically generate config (ex dev vs prod, auto-discover credentials, etc.). Kinda like Django settings.

Though I imagine Python imports might make this complex to do, and json/yaml is already supported and pretty easy to write

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
De-tangling Metadata before Datasette 1.0 1855885427  
1684202932 https://github.com/simonw/datasette/issues/2143#issuecomment-1684202932 https://api.github.com/repos/simonw/datasette/issues/2143 IC_kwDOBm6k_c5kYuW0 asg017 15178711 2023-08-18T17:10:21Z 2023-08-18T17:10:21Z CONTRIBUTOR

I agree with all your points!

I think the best solution would be having a datasette.json config file, where you "configure" your datasette instances, with settings, permissions/auth, plugin configuration, and table settings (sortable column, label columns, etc.). Which #2093 would do.

Then optionally, you have a metadata.json, or use datasette_metadata, or some other plugin to define metadata (ex the future sqlite-docs plugin).

Everything in datasette.json could also be overwritten by CLI flags, like --setting key value, --plugin xxxx key value.

We could even completely remove settings.json in favor or just datasette.json. Mostly because I think the less files the better, especially if they have generic names like settings.json or config.json.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
De-tangling Metadata before Datasette 1.0 1855885427  
1683429959 https://github.com/simonw/datasette/issues/2143#issuecomment-1683429959 https://api.github.com/repos/simonw/datasette/issues/2143 IC_kwDOBm6k_c5kVxpH simonw 9599 2023-08-18T06:43:33Z 2023-08-18T15:19:07Z OWNER

The single biggest design challenge I've had with metadata relates to how it should or should not be inherited.

If you apply a license to a Datasette instance, it feels like that should flow down to cover all of the databases and all of the tables within those databases.

If the license is at the database level, it should cover all tables.

But... should source do the same thing? I made it behave the same way as license, but it's presumably common for one database to have a single license but multiple different sources of data.

Then there's title - should that inherit? It feels like title should apply to only one level - you may want a title that applies to the instance, then a different custom title for databases and tables.

Here's the current state of play for metadata: https://docs.datasette.io/en/1.0a3/metadata.html

So there's title and description - and I'll be honest, I'm not 100% sure even I understand how those should be inherited down by tables/etc.

There's description_html which over-rides the description if it is set. It's a useful customization hack, but a bit surprising.

Then there are these six:

  • license
  • license_url
  • source
  • source_url
  • about
  • about_url

I added about later than the others, because I realized that plenty of my own projects needed a link to an article explaining them somewhere - e.g. https://scotrail.datasette.io/

Tables can also have column descriptions - just a string for each column. There's a demo of those here: https://latest.datasette.io/fixtures/roadside_attractions

And then there's all of the other stuff, most of which feels much more like "settings" than "metadata":

  • sort: created - the custom sort order
  • size: 10 for a custom page size for a specific table
  • sortable_columns to set which columns can be used to sort
  • hidden: true to hide a table
  • label_column: title is an interesting one - it lets you hint to Datasette which column should be displayed when there is a foreign key relationship. It's sort-of-metadata and sort-of-a-setting.
  • facets sets default facets, see https://docs.datasette.io/en/1.0a3/facets.html#facets-in-metadata
  • facet_size sets the number of facets to display
  • fts_table and fts_pk can be used to configure FTS, especially for views: https://docs.datasette.io/en/1.0a3/full_text_search.html

And the authentication stuff! allow and allow_sql blocks: https://docs.datasette.io/en/1.0a3/authentication.html#defining-permissions-with-allow-blocks

And the new permissions key in the 1.0 alphas: https://docs.datasette.io/en/1.0a3/authentication.html#other-permissions-in-metadata

I think that might be everything (excluding the plugins settings stuff, which is also a bad fit for metadata.)

And to make things even more confusing... I believe you can add arbitrary key/value pairs to your metadata and then use them in your templates! I think I've heard from at least one person who uses that ability.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
De-tangling Metadata before Datasette 1.0 1855885427  
1683420879 https://github.com/simonw/datasette/issues/2143#issuecomment-1683420879 https://api.github.com/repos/simonw/datasette/issues/2143 IC_kwDOBm6k_c5kVvbP simonw 9599 2023-08-18T06:33:24Z 2023-08-18T15:15:34Z OWNER

I completely agree: metadata is a mess, and it deserves our attention.

  1. Metadata cannot be updated without re-starting the entire Datasette instance.

That's not completely true - there are hacks around that. I have a plugin that applies one set of gnarly hacks for that here: https://github.com/simonw/datasette-remote-metadata - it's pretty grim though!

  1. The metadata.json/metadata.yaml has become a kitchen sink of unrelated (imo) features like plugin config, authentication config, canned queries

100% this: it's a complete mess.

Datasette used to have a datasette --config foo:bar mechanism, which I deprecated in favour of datasette --setting foo bar partly because I wanted to free up --config for pointing at a real config file, so we could stop dropping everything in --metadata metadata.yml.

  1. The Python APIs for defining extra metadata are a bit awkward (the datasette.metadata() class, get_metadata() hook, etc.)

Yes, they're not pretty at all.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
De-tangling Metadata before Datasette 1.0 1855885427  
1683443891 https://github.com/simonw/datasette/issues/2143#issuecomment-1683443891 https://api.github.com/repos/simonw/datasette/issues/2143 IC_kwDOBm6k_c5kV1Cz simonw 9599 2023-08-18T06:58:15Z 2023-08-18T06:58:15Z OWNER

Hah, that --plugin-secret thing was a messy solution I came up with to the problem that all metadata is visible at /-/metadata - so if you need to stash a secret you need a way to keep it not-visible in there!

Hence the whole $env mess: https://docs.datasette.io/en/stable/plugins.html#secret-configuration-values

json { "plugins": { "datasette-auth-github": { "client_secret": { "$env": "GITHUB_CLIENT_SECRET" } } } }

If configuration and metadata were separate we could ditch that whole messy situation - configuration can stay hidden, metadata can stay public.

Though I have been thinking that Datasette might benefit from a "secrets" mechanism that's separate from configuration and metadata... kind of like what LLM has: https://llm.datasette.io/en/stable/help.html#llm-keys-help

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
De-tangling Metadata before Datasette 1.0 1855885427  
1683440597 https://github.com/simonw/datasette/issues/2143#issuecomment-1683440597 https://api.github.com/repos/simonw/datasette/issues/2143 IC_kwDOBm6k_c5kV0PV simonw 9599 2023-08-18T06:54:49Z 2023-08-18T06:54:49Z OWNER

A related point that I've been considering a lot recently: it turns out that sometimes I really want to define settings on the CLI instead of in a file, purely for convenience.

It's pretty annoying when I want to try out a new plugin but I have to create a dedicated metadata.yml file for it just to setup a single option - I'd love to have the option to be able to run this instead:

bash datasette data.db --plugin-setting datasette-upload-csvs default-database data

So maybe there's a world in which all of the settings can be applied in a datasette.yml file OR with command-line options.

That gets trickier when you need to pass a nested structure or similar, but we could always support those as JSON:

bash datasette data.db --plugin-setting datasette-emoji-reactions emoji '["😼", "🐺"]' Note that we kind of have precedent for this in datasette publish: https://docs.datasette.io/en/stable/publish.html#custom-metadata-and-plugins

bash datasette publish heroku my_database.db \ --name my-heroku-app-demo \ --install=datasette-auth-github \ --plugin-secret datasette-auth-github client_id your_client_id \ --plugin-secret datasette-auth-github client_secret your_client_secret

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
De-tangling Metadata before Datasette 1.0 1855885427  
1683435579 https://github.com/simonw/datasette/issues/2143#issuecomment-1683435579 https://api.github.com/repos/simonw/datasette/issues/2143 IC_kwDOBm6k_c5kVzA7 simonw 9599 2023-08-18T06:49:39Z 2023-08-18T06:49:39Z OWNER

My ideal situation then would be something like this:

  • Metadata itself is VERY clearly described, including sensible rules for metadata inheritance where it makes sense. There is a datasette.X method for accessing it which is much more intuitive than datasette.metadata().
  • It's possible that method should be an async method, because that would support things like plugins that lookup metadata in database tables better.
  • All templates etc switch to the new, clean, intuitive metadata mechanism before 1.0.
  • I'm interested in the option of metadata being able to live in a _datasette_metadata table in the databases themselves - either as a plugin or as a core feature. I think it makes a lot of sense for metadata to optionally live with the data that it describes.
  • Configuration gets split from metadata. The stuff that configures Datasette no longer lives in the metadata.yml file - it lives in config.yml (or even datasette.yml).

Currently we have three types of things:

  • Metadata - information about the data
  • Configuration - stuff like "these columns should be sortable" and "this is configured as fts_table" and suchlike
  • Settings - the stuff that you pass to datasette --setting x y on server start.

Should settings and configuration be separate? I'm not 100% sure that they should - maybe those two concepts should be combined somehow.

Configuration directory mode needs to be considered too: https://docs.datasette.io/en/stable/settings.html#configuration-directory-mode - interestingly it already has a thing where it can pick up settings from a settings.json file - where settings are things like datasette --setting sql_time_limit_ms 4000.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
De-tangling Metadata before Datasette 1.0 1855885427  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 28.521ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows