home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1855885427

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app reactions draft state_reason
1855885427 I_kwDOBm6k_c5unpBz 2143 De-tangling Metadata before Datasette 1.0 15178711 open 0     24 2023-08-18T00:51:50Z 2023-08-24T18:28:27Z   CONTRIBUTOR  

Metadata in Datasette is a really powerful feature, but is a bit difficult to work with. It was initially a way to add "metadata" about your "data" in Datasette instances, like descriptions for databases/tables/columns, titles, source URLs, licenses, etc. But it later became the go-to spot for other Datasette features that have nothing to do with metadata, like permissions/plugins/canned queries.

Specifically, I've found the following problems when working with Datasette metadata:

  1. Metadata cannot be updated without re-starting the entire Datasette instance.
  2. The metadata.json/metadata.yaml has become a kitchen sink of unrelated (imo) features like plugin config, authentication config, canned queries
  3. The Python APIs for defining extra metadata are a bit awkward (the datasette.metadata() class, get_metadata() hook, etc.)

Possible solutions

Here's a few ideas of Datasette core changes we can make to address these problems.

Re-vamp the Datasette Python metadata APIs

The Datasette object has a single datasette.metadata() method that's a bit difficult to work with. There's also no Python API for inserted new metadata, so plugins have to rely on the get_metadata() hook.

The get_metadata() hook can also be improved - it doesn't work with async functions yet, so you're quite limited to what you can do.

(I'm a bit fuzzy on what to actually do here, but I imagine it'll be very small breaking changes to a few Python methods)

Add an optional datasette_metadata table

Datasette should detect and use metadata stored in a new special table called datasette_metadata. This would be a regular table that a user can edit on their own, and would serve as a "live updating" source of metadata, than can be changed while the Datasette instance is running.

Not too sure what the schema would look like, but I'd imagine:

sql CREATE TABLE datasette_metadata( level text, target any, key text, value any, primary key (level, target) )

Every row in this table would map to a single metadata "entry".

  • level would be one of "datasette", "database", "table", "column", which is the "level" the entry describes. For example, level="table" means it is metadata about a specific table, level="database" for a specific database, or level="datasette" for the entire Datasette instance.
  • target would "point" to the specific object the entry metadata is about, and would depend on what level is specific.
  • level="database": target would be the string name of the database that the metadata entry is about. ex "fixtures"
  • level="table": target would be a JSON array of two strings. The first element would be the database name, and the second would be the table name. ex ["fixtures", "students"]
  • level="column": target would be a JSON array of 3 strings: The database name, table name, and column name. Ex ["fixtures", "students", "student_id"]
  • key would be the type of metadata entry the row has, similar to the current "keys" that exist in metadata.json. Ex "about_url", "source", "description", etc
  • value would be the text value of be metadata entry. The literal text value of a description, about_url, column_label, etc

A quick sample:

level | target | key | value -- | -- | -- | -- datasette | NULL | title | my datasette title... db | fixtures | source | <description of my database source> table | ["fixtures", "students"] | label_column | student_name column | ["fixtures", "students", "birthdate"] | description | <description of the fixtures.students.birthdate column>

This datasette_metadata would be configured with other tools, and hopefully not manually by end users. Datasette Core could also offer a UI for editing entries in datasette_metadata, to update descriptions/columns on the fly.

Re-vamp metadata.json and move non-metadata config to another place

The motivation behind this is that it's awkward that metadata.json contains config about things that are not strictly metadata, including:

  • Plugin configuration
  • Authentication/permissions (ex the allow key on datasettes/databases/tables
  • Canned queries. might be controversial, but in my mind, canned queries are application-specific code and configuration, and don't describe the data that exists in SQLite databases.

I think we should move these outside of metadata.json and into a different file. The datasette.json idea in #2093 may be a good solution here: plugin/permissions/canned queries can be defined in datasette.json, while metadata.json/datasette_metadata will strictly be about documenting databases/tables/columns.

107914493 issue    
{
    "url": "https://api.github.com/repos/simonw/datasette/issues/2143/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
   

Links from other tables

  • 1 row from issues_id in issues_labels
  • 24 rows from issue in issue_comments
Powered by Datasette · Queries took 1.161ms · About: github-to-sqlite