18 rows where issue = 777333388 sorted by updated_at descending

View and edit SQL

Suggested facets: created_at (date), updated_at (date)

user

issue

  • Mechanism for storing metadata in _metadata tables · 18

author_association

id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
753524779 https://github.com/simonw/datasette/issues/1168#issuecomment-753524779 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzUyNDc3OQ== simonw 9599 2021-01-02T20:19:26Z 2021-01-02T20:19:26Z OWNER

Idea: version the metadata scheme. If the table is called _metadata_v1 it gives me a clear path to designing a new scheme in the future.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753402423 https://github.com/simonw/datasette/issues/1168#issuecomment-753402423 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzQwMjQyMw== simonw 9599 2021-01-01T23:16:05Z 2021-01-01T23:16:05Z OWNER

One catch: solving the "show me all metadata for everything in this Datasette instance" problem.

Ideally there would be a SQLite table that can be queried for this. But the need to resolve the potentially complex set of precedence rules means that table would be difficult if not impossible to provide at run-time.

Ideally a denormalized table would be available that featured the results of running those precedence rule calculations. But how to handle keeping this up-to-date? It would need to be recalculated any time a _metadata table in any of the attached databases had an update.

This is a much larger problem - but one potential fix would be to use triggers to maintain a "version number" for the _metadata table - similar to SQLite's own built-in schema_version mechanism. Triggers could increment a counter any time a record in that table was added, deleted or updated.

Such a mechanism would have applications outside of just this _metadata system. The ability to attach a version number to any table and have it automatically incremented when that table changes (via triggers) could help with all kinds of other Datasette-at-scale problems, including things like cached table counts.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753401001 https://github.com/simonw/datasette/issues/1168#issuecomment-753401001 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzQwMTAwMQ== simonw 9599 2021-01-01T23:01:45Z 2021-01-01T23:01:45Z OWNER

I need to prototype this. Could I do that as a plugin? I think so - I could try out the algorithm for loading metadata and display it on pages using some custom templates.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753400420 https://github.com/simonw/datasette/issues/1168#issuecomment-753400420 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzQwMDQyMA== simonw 9599 2021-01-01T22:53:58Z 2021-01-01T22:53:58Z OWNER

Precedence idea:
- First priority is non-_internal metadata from other databases - if those conflict then pick then the alphabetically-ordered-first database name wins
- Next priority: _internal metadata, which should have been loaded from metadata.json
- Last priority: the _metadata table from that database itself, i.e. the default "baked in" metadata

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753400306 https://github.com/simonw/datasette/issues/1168#issuecomment-753400306 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzQwMDMwNg== simonw 9599 2021-01-01T22:52:44Z 2021-01-01T22:52:44Z OWNER

Also: probably load column metadata as part of the table metadata rather than loading column metadata individually, since it's going to be rare to want the metadata for a single column rather than for an entire table full of columns.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753400265 https://github.com/simonw/datasette/issues/1168#issuecomment-753400265 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzQwMDI2NQ== simonw 9599 2021-01-01T22:52:09Z 2021-01-01T22:52:09Z OWNER

From an implementation perspective, I think the way this works is SQL queries read the relevant metadata from ALL available metadata tables, then Python code solves the precedence rules to produce the final, combined metadata for a database/table/column.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753399635 https://github.com/simonw/datasette/issues/1168#issuecomment-753399635 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzM5OTYzNQ== simonw 9599 2021-01-01T22:45:21Z 2021-01-01T22:50:21Z OWNER

Would also need to figure out the precedence rules:

  • What happens if the database has a _metadata table with data that conflicts with a remote metadata record from another database? I think the other database should win, because that allows plugins to over-ride the default metadata for something.
  • Do JSON values get merged together? So if one table provides a description and another provides a title do both values get returned?
  • If a database has a license, does that "cascade" down to the tables? What about source and about?
  • What if there are two databases (or more) that provide conflicting metadata for a table in some other database? Also, _internal may have loaded data from metadata.json that conflicts with some other remote table metadata definition.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753399428 https://github.com/simonw/datasette/issues/1168#issuecomment-753399428 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzM5OTQyOA== simonw 9599 2021-01-01T22:43:14Z 2021-01-01T22:43:22Z OWNER

Could this use a compound primary key on database, table, column? Does that work with null values?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753399366 https://github.com/simonw/datasette/issues/1168#issuecomment-753399366 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzM5OTM2Ng== simonw 9599 2021-01-01T22:42:37Z 2021-01-01T22:42:37Z OWNER

So what would the database schema for this look like?

I'm leaning towards a single table called _metadata, because that's a neater fit for baking the metadata into the database file along with the data that it is describing. Alternatively I could have multiple tables sharing that prefix - _metadata_database and _metadata_tables and _metadata_columns perhaps.

If it's just a single _metadata table, the schema could look like this:

<table> <thead> <tr> <th>database</th> <th>table</th> <th>column</th> <th>metadata</th> </tr> </thead> <tbody> <tr> <td></td> <td>mytable</td> <td></td> <td>{"title": "My Table" }</td> </tr> <tr> <td></td> <td>mytable</td> <td>mycolumn</td> <td>{"description": "Column description" }</td> </tr> <tr> <td>otherdb</td> <td>othertable</td> <td></td> <td>{"description": "Table in another DB" }</td> </tr> </tbody> </table>

If the database column is null it means "this is describing a table in the same database file as this _metadata table".

The alternative to the metadata JSON column would be separate columns for each potential metadata value - license, source, about, about_url etc. But that makes it harder for people to create custom metadata fields.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753398542 https://github.com/simonw/datasette/issues/1168#issuecomment-753398542 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzM5ODU0Mg== simonw 9599 2021-01-01T22:37:24Z 2021-01-01T22:37:24Z OWNER

The direction I'm leaning in now is the following:

  • Metadata always lives in SQLite tables
  • These tables can be co-located with the database they describe (same DB file)
  • ... or they can be in a different DB file and reference the other database that they are describing
  • Metadata provided on startup in a metadata.json file is loaded into an in-memory metadata table using that same mechanism

Plugins that want to provide metadata can do so by populating a table. They could even maintain their own in-memory database for this, or they could write to the _internal in-memory database, or they could write to a table in a database on disk.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753392102 https://github.com/simonw/datasette/issues/1168#issuecomment-753392102 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzM5MjEwMg== simonw 9599 2021-01-01T22:06:33Z 2021-01-01T22:06:33Z OWNER

Some SQLite databases include SQL comments in the schema definition which tell you what each column means:

CREATE TABLE User
        -- A table comment
(
        uid INTEGER,    -- A field comment
        flags INTEGER   -- Another field comment
);

The problem with these is that they're not exposed to SQLite in any mechanism other than parsing the CREATE TABLE statement from the sqlite_master table to extract those columns.

I had an idea to build a plugin that could return these. That would be easy with a "get metadata for this column" plugin hook - in the absence of one a plugin could still run that reads the schemas on startup and uses them to populate a metadata database table somewhere.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753391869 https://github.com/simonw/datasette/issues/1168#issuecomment-753391869 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzM5MTg2OQ== simonw 9599 2021-01-01T22:04:30Z 2021-01-01T22:04:30Z OWNER

The sticking point here seems to be the plugin hook. Allowing plugins to over-ride the way the question "give me the metadata for this database/table/column" is answered makes the database-backed metadata mechanisms much more complicated to think about.

What if plugins didn't get to over-ride metadata in this way, but could instead update the metadata in a persistent Datasette-managed storage mechanism?

Then maybe Datasette could do the following:

  • Maintain metadata in _internal that has been loaded from metadata.json
  • Know how to check a database for baked-in metadata (maybe in a _metadata table)
  • Know how to fall back on the _internal metadata if no baked-in metadata is available

If database files were optionally allowed to store metadata about tables that live in another database file this could perhaps solve the plugin needs - since an "edit metadata" plugin would be able to edit records in a separate, dedicated metadata.db database to store new information about tables in other files.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753390791 https://github.com/simonw/datasette/issues/1168#issuecomment-753390791 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzM5MDc5MQ== simonw 9599 2021-01-01T22:00:42Z 2021-01-01T22:00:42Z OWNER

Here are the requirements I'm currently trying to satisfy:

  • It should be possible to query the metadata for ALL attached tables in one place, potentially with pagination and filtering
  • Metadata should be able to exist in the current metadata.json file
  • It should also be possible to bundle metadata in a table in the SQLite database files themselves
  • Plugins should be able to define their own special mechanisms for metadata. This is particularly interesting for providing a UI that allows users to edit the metadata for their existing tables.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753390262 https://github.com/simonw/datasette/issues/1168#issuecomment-753390262 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzM5MDI2Mg== simonw 9599 2021-01-01T21:58:11Z 2021-01-01T21:58:11Z OWNER

One possibility: plugins could write directly to that in-memory database table. But how would they know to write again should the server restart? Maybe they would write to it once when called by the startup plugin hook, and then update it (and their own backing store) when metadata changes for some reason. Feels a bit messy though.

Also: if I want to support metadata optionally living in a _metadata table colocated with the data in a SQLite database file itself, how would that affect the metadata columns in _internal? How often would Datasette denormalize and copy data across from the on-disk _metadata tables to the _internal in-memory columns?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753389938 https://github.com/simonw/datasette/issues/1168#issuecomment-753389938 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzM4OTkzOA== simonw 9599 2021-01-01T21:54:15Z 2021-01-01T21:54:15Z OWNER

So what if the databases, tables and columns tables in _internal each grew a new metadata text column?

These columns could be populated by Datasette on startup through reading the metadata.json file. But how would plugins interact with them?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753389477 https://github.com/simonw/datasette/issues/1168#issuecomment-753389477 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzM4OTQ3Nw== simonw 9599 2021-01-01T21:49:57Z 2021-01-01T21:49:57Z OWNER

What if metadata was stored in a JSON text column in the existing _internal tables? This would allow for users to invent additional metadata fields in the future beyond the current license, license_url etc fields - without needing a schema change.

The downside of JSON columns generally is that they're harder to run indexed queries against. For metadata I don't think that matters - even with 10,000 tables each with their own metadata a SQL query asking for e.g. "everything that has Apache 2 as the license" would return in just a few ms.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753388809 https://github.com/simonw/datasette/issues/1168#issuecomment-753388809 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzM4ODgwOQ== simonw 9599 2021-01-01T21:47:51Z 2021-01-01T21:47:51Z OWNER

A database that exposes metadata will have the same restriction as the new _internal database that exposes columns and tables, in that it needs to take permissions into account. A user should not be able to view metadata for tables that they are not able to see.

As such, I'd rather bundle any metadata tables into the existing _internal database so I don't have to solve that permissions problem in two places.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
753366024 https://github.com/simonw/datasette/issues/1168#issuecomment-753366024 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDc1MzM2NjAyNA== simonw 9599 2021-01-01T18:48:34Z 2021-01-01T18:48:34Z OWNER

Also: in #188 I proposed bundling metadata in the SQLite database itself alongside the data. This is a great way of ensuring metadata travels with the data when it is downloaded as a SQLite .db file. But how would that play with the idea of an in-memory _metadata table? Could that table perhaps offer views that join data across multiple attached physical databases?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);