home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

32 rows where issue = 770436876 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • simonw 31
  • noklam 1

author_association 2

  • OWNER 31
  • NONE 1

issue 1

  • Maintain an in-memory SQLite table of connected databases and their tables · 32 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
751476406 https://github.com/simonw/datasette/issues/1150#issuecomment-751476406 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc1MTQ3NjQwNg== noklam 18221871 2020-12-27T14:51:39Z 2020-12-27T14:51:39Z NONE

I like the idea of _internal, it's a nice way to get a data catalog quickly. I wonder if this trick applies to db other than SQLite.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
748354841 https://github.com/simonw/datasette/issues/1150#issuecomment-748354841 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0ODM1NDg0MQ== simonw 9599 2020-12-18T22:43:49Z 2020-12-18T22:43:49Z OWNER

For a demo, visit https://latest.datasette.io/login-as-root and then hit https://latest.datasette.io/_schemas

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
748352106 https://github.com/simonw/datasette/issues/1150#issuecomment-748352106 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0ODM1MjEwNg== simonw 9599 2020-12-18T22:34:40Z 2020-12-18T22:34:40Z OWNER

Needs documentation, but I can wait to write that until I've tested out the feature a bit more.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
748351350 https://github.com/simonw/datasette/issues/1150#issuecomment-748351350 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0ODM1MTM1MA== simonw 9599 2020-12-18T22:32:13Z 2020-12-18T22:32:13Z OWNER

Getting all the tests to pass is tricky because this adds a whole extra database to Datasette - and there's various code that loops through ds.databases as part of the tests.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
748260875 https://github.com/simonw/datasette/issues/1150#issuecomment-748260875 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0ODI2MDg3NQ== simonw 9599 2020-12-18T18:55:12Z 2020-12-18T18:55:12Z OWNER

I'm going to move the code into a utils/schemas.py module, to avoid further extending the Datasette class definition and to make it more easily testable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
748260118 https://github.com/simonw/datasette/issues/1150#issuecomment-748260118 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0ODI2MDExOA== simonw 9599 2020-12-18T18:54:12Z 2020-12-18T18:54:12Z OWNER

I'm going to tidy this up and land it. A couple of additional decisions:

  • The database will be called /_schemas
  • By default it will only be visible to root - thus avoiding having to solve the permissions problem with regards to users seeing schemas for tables that are otherwise invisible to them.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747893704 https://github.com/simonw/datasette/issues/1150#issuecomment-747893704 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzg5MzcwNA== simonw 9599 2020-12-18T06:19:13Z 2020-12-18T06:19:13Z OWNER

I'm not going to block this issue on permissions - I will tackle the efficient bulk permissions problem in #1152.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747864831 https://github.com/simonw/datasette/issues/1150#issuecomment-747864831 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzg2NDgzMQ== simonw 9599 2020-12-18T04:46:18Z 2020-12-18T04:46:18Z OWNER

The homepage currently performs a massive flurry of permission checks - one for each, database, table and view: https://github.com/simonw/datasette/blob/0.53/datasette/views/index.py#L21-L75

A paginated version of this is a little daunting as the permission checks would have to be carried out in every single table just to calculate the count that will be paginated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747864080 https://github.com/simonw/datasette/issues/1150#issuecomment-747864080 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzg2NDA4MA== simonw 9599 2020-12-18T04:43:29Z 2020-12-18T04:43:29Z OWNER

I may be overthinking that problem. Many queries are fast in SQLite. If a Datasette instance has 1,000 connected tables will even that be a performance problem for permission checks? I should benchmark to find out.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747862001 https://github.com/simonw/datasette/issues/1150#issuecomment-747862001 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzg2MjAwMQ== simonw 9599 2020-12-18T04:35:34Z 2020-12-18T04:35:34Z OWNER

I do need to solve the permissions problem properly though, because one of the goals of this system is to provide a paginated, searchable list of databases and tables for the homepage of the instance - #991.

As such, the homepage will need to be able to display only the tables and databases that the user has permission to view.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747861556 https://github.com/simonw/datasette/issues/1150#issuecomment-747861556 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzg2MTU1Ng== simonw 9599 2020-12-18T04:33:45Z 2020-12-18T04:33:45Z OWNER

One solution on permissions: if Datasette had an efficient way of saying "list the tables that this user has access to" I could use that as a filter any time the user views the schema information. The implementation could be tricky though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747861357 https://github.com/simonw/datasette/issues/1150#issuecomment-747861357 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzg2MTM1Nw== simonw 9599 2020-12-18T04:32:52Z 2020-12-18T04:32:52Z OWNER

I need to figure out how this will interact with Datasette permissions.

If some tables are private, but others are public, should users be able to see the private tables listed in the schema metadata?

If not, how can that mechanism work?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747833639 https://github.com/simonw/datasette/issues/1150#issuecomment-747833639 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0NzgzMzYzOQ== simonw 9599 2020-12-18T02:49:40Z 2020-12-18T03:52:12Z OWNER

I'm going to use five tables to start off with:

  • databases - a list of databases. Each one has a name, path (if it's on disk), is_memory, schema_version
  • tables - a list of tables. Each row is database_name, table_name, sql (the create table statement) - may add more tables in the future, in particular maybe a last_row_count to cache results of counting the rows.
  • columns - a list of columns. It's the output of pragma_table_xinfo with the database_name and table_name columns added at the beginning.
  • foreign_keys - a list of foreign keys - pragma_foreign_key_list output plus database_name and table_name.
  • indexes - a list of indexes - pragma_table_xinfo output plus database_name and table_name.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747847405 https://github.com/simonw/datasette/issues/1150#issuecomment-747847405 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzg0NzQwNQ== simonw 9599 2020-12-18T03:36:04Z 2020-12-18T03:36:04Z OWNER

I could have another table that stores the combined rows from sqlite_máster on every connected database so I have a copy of the schema SQL.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747847180 https://github.com/simonw/datasette/issues/1150#issuecomment-747847180 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzg0NzE4MA== simonw 9599 2020-12-18T03:35:15Z 2020-12-18T03:35:15Z OWNER

Simpler implementation idea: a Datasette method .refresh_schemas() which loops through all known databases, checks their schema version and updates the in-memory schemas database if they have changed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747834762 https://github.com/simonw/datasette/issues/1150#issuecomment-747834762 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0NzgzNDc2Mg== simonw 9599 2020-12-18T02:53:22Z 2020-12-18T02:53:22Z OWNER

I think I'm going to have to build this without using the pragma_x() SQL functions as they were only added in 3.16 in 2017-01-02 and I've seen plenty of Datasette instances running on older versions of SQLite.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747834462 https://github.com/simonw/datasette/issues/1150#issuecomment-747834462 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0NzgzNDQ2Mg== simonw 9599 2020-12-18T02:52:19Z 2020-12-18T02:52:26Z OWNER

Maintaining this database will be the responsibility of a subclass of Database called _SchemaDatabase which will be managed by the Datasette instance.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747834113 https://github.com/simonw/datasette/issues/1150#issuecomment-747834113 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0NzgzNDExMw== simonw 9599 2020-12-18T02:51:13Z 2020-12-18T02:51:20Z OWNER

SQLite uses indexes rather than indices as the plural, so I'll go with that: https://sqlite.org/lang_createindex.html

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747809670 https://github.com/simonw/datasette/issues/1150#issuecomment-747809670 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0NzgwOTY3MA== simonw 9599 2020-12-18T01:29:30Z 2020-12-18T01:29:30Z OWNER

I've been rediscovering the pattern I already documented in this TIL: https://github.com/simonw/til/blob/main/sqlite/list-all-columns-in-a-database.md#better-alternative-using-a-join

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747807891 https://github.com/simonw/datasette/issues/1150#issuecomment-747807891 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0NzgwNzg5MQ== simonw 9599 2020-12-18T01:23:59Z 2020-12-18T01:23:59Z OWNER

https://www.sqlite.org/pragma.html#pragfunc says:

  • This feature is experimental and is subject to change. Further documentation will become available if and when the table-valued functions for PRAGMAs feature becomes officially supported.
  • The table-valued functions for PRAGMA feature was added in SQLite version 3.16.0 (2017-01-02). Prior versions of SQLite cannot use this feature.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747807289 https://github.com/simonw/datasette/issues/1150#issuecomment-747807289 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0NzgwNzI4OQ== simonw 9599 2020-12-18T01:22:05Z 2020-12-18T01:22:05Z OWNER

Here's a simpler query pattern (not using CTEs so should work on older versions of SQLite) - this one lists all indexes for all tables: sql select sqlite_master.name as 'table', indexes.* from sqlite_master join pragma_index_list(sqlite_master.name) indexes where sqlite_master.type = 'table' https://latest.datasette.io/fixtures?sql=select%0D%0A++sqlite_master.name+as+%27table%27%2C%0D%0A++indexes.*%0D%0Afrom%0D%0A++sqlite_master%0D%0A++join+pragma_index_list%28sqlite_master.name%29+indexes%0D%0Awhere%0D%0A++sqlite_master.type+%3D+%27table%27

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747805275 https://github.com/simonw/datasette/issues/1150#issuecomment-747805275 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0NzgwNTI3NQ== simonw 9599 2020-12-18T01:15:27Z 2020-12-18T01:16:17Z OWNER

This query uses a join to pull foreign key information for every table: https://latest.datasette.io/fixtures?sql=with+tables+as+%28%0D%0A++select%0D%0A++++name%0D%0A++from%0D%0A++++sqlite_master%0D%0A++where%0D%0A++++type+%3D+%27table%27%0D%0A%29%0D%0Aselect%0D%0A++tables.name+as+%27table%27%2C%0D%0A++foo.*%0D%0Afrom%0D%0A++tables%0D%0A++join+pragma_foreign_key_list%28tables.name%29+foo

sql with tables as ( select name from sqlite_master where type = 'table' ) select tables.name as 'table', foo.* from tables join pragma_foreign_key_list(tables.name) foo

Same query for pragma_table_xinfo: https://latest.datasette.io/fixtures?sql=with+tables+as+%28%0D%0A++select%0D%0A++++name%0D%0A++from%0D%0A++++sqlite_master%0D%0A++where%0D%0A++++type+%3D+%27table%27%0D%0A%29%0D%0Aselect%0D%0A++tables.name+as+%27table%27%2C%0D%0A++foo.*%0D%0Afrom%0D%0A++tables%0D%0A++join+pragma_table_xinfo%28tables.name%29+foo

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747804254 https://github.com/simonw/datasette/issues/1150#issuecomment-747804254 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0NzgwNDI1NA== simonw 9599 2020-12-18T01:12:13Z 2020-12-18T01:12:13Z OWNER

Prototype: https://latest.datasette.io/fixtures?sql=select+%27facetable%27+as+%27table%27%2C++from+pragma_table_xinfo%28%27facetable%27%29%0D%0Aunion%0D%0Aselect+%27searchable%27+as+%27table%27%2C++from+pragma_table_xinfo%28%27searchable%27%29%0D%0Aunion%0D%0Aselect+%27compound_three_primary_keys%27+as+%27table%27%2C+*+from+pragma_table_xinfo%28%27compound_three_primary_keys%27%29

sql select 'facetable' as 'table', * from pragma_table_xinfo('facetable') union select 'searchable' as 'table', * from pragma_table_xinfo('searchable') union select 'compound_three_primary_keys' as 'table', * from pragma_table_xinfo('compound_three_primary_keys')

table | cid | name | type | notnull | dflt_value | pk | hidden -- | -- | -- | -- | -- | -- | -- | -- compound_three_primary_keys | 0 | pk1 | varchar(30) | 0 |   | 1 | 0 compound_three_primary_keys | 1 | pk2 | varchar(30) | 0 |   | 2 | 0 compound_three_primary_keys | 2 | pk3 | varchar(30) | 0 |   | 3 | 0 compound_three_primary_keys | 3 | content | text | 0 |   | 0 | 0 facetable | 0 | pk | integer | 0 |   | 1 | 0 facetable | 1 | created | text | 0 |   | 0 | 0 facetable | 2 | planet_int | integer | 0 |   | 0 | 0 facetable | 3 | on_earth | integer | 0 |   | 0 | 0 facetable | 4 | state | text | 0 |   | 0 | 0 facetable | 5 | city_id | integer | 0 |   | 0 | 0 facetable | 6 | neighborhood | text | 0 |   | 0 | 0 facetable | 7 | tags | text | 0 |   | 0 | 0 facetable | 8 | complex_array | text | 0 |   | 0 | 0 facetable | 9 | distinct_some_null |   | 0 |   | 0 | 0 searchable | 0 | pk | integer | 0 |   | 1 | 0 searchable | 1 | text1 | text | 0 |   | 0 | 0 searchable | 2 | text2 | text | 0 |   | 0 | 0 searchable | 3 | name with . and spaces | text | 0 |   | 0 | 0

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747803268 https://github.com/simonw/datasette/issues/1150#issuecomment-747803268 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0NzgwMzI2OA== simonw 9599 2020-12-18T01:08:40Z 2020-12-18T01:08:40Z OWNER

Next step: design a schema for the in-memory database table that exposes all of the tables.

I want to support things like:

  • Show me all of the tables
  • Show me the columns in a table
  • Show me all tables that contain a tags column
  • Show me the indexes
  • Show me every table configured for full-text search

Maybe a starting point would be to build concrete tables using the results of things like PRAGMA foreign_key_list(table) and PRAGMA table_xinfo(table) - note though that table_xinfo is SQLite 3.26.0 or higher, as shown here: https://github.com/simonw/datasette/blob/5e9895c67f08e9f42acedd3d6d29512ac446e15f/datasette/utils/init.py#L563-L579

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747768112 https://github.com/simonw/datasette/issues/1150#issuecomment-747768112 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzc2ODExMg== simonw 9599 2020-12-17T23:25:21Z 2020-12-17T23:25:21Z OWNER

Next challenge: figure out how to use the Database class from https://github.com/simonw/datasette/blob/0.53/datasette/database.py for an in-memory database which persists data for the duration of the lifetime of the server, and allows access to that in-memory database from multiple threads in a way that lets them see each other's changes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747767598 https://github.com/simonw/datasette/issues/1150#issuecomment-747767598 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzc2NzU5OA== simonw 9599 2020-12-17T23:24:03Z 2020-12-17T23:24:03Z OWNER

I'm going to assume that even the heaviest user will have trouble going beyond a few hundred database files, so this is fine.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747767499 https://github.com/simonw/datasette/issues/1150#issuecomment-747767499 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzc2NzQ5OQ== simonw 9599 2020-12-17T23:23:44Z 2020-12-17T23:23:44Z OWNER

Grabbing the schema version of 380 files in the root directory takes 70ms.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747767055 https://github.com/simonw/datasette/issues/1150#issuecomment-747767055 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzc2NzA1NQ== simonw 9599 2020-12-17T23:22:41Z 2020-12-17T23:22:41Z OWNER

It's just recursion that's expensive. I created 380 empty SQLite databases in a folder and timed list(pathlib.Path("/tmp").glob("*.db")); and it took 0.002s.

So maybe I tell users that all SQLite databases have to be in the root folder.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747766310 https://github.com/simonw/datasette/issues/1150#issuecomment-747766310 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzc2NjMxMA== simonw 9599 2020-12-17T23:20:49Z 2020-12-17T23:20:49Z OWNER

I tried against my entire ~/Development/Dropbox folder - deeply nested with 381 SQLite database files in sub-folders - and it took 25s! But it turned out 23.9s of that was the call to pathlib.Path("/Users/simon/Dropbox/Development").glob('**/*.db').

So it looks like connecting to a SQLite database file and getting the schema version is extremely fast. Scanning directories is slower.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747764712 https://github.com/simonw/datasette/issues/1150#issuecomment-747764712 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzc2NDcxMg== simonw 9599 2020-12-17T23:16:31Z 2020-12-17T23:16:31Z OWNER

Quick micro-benchmark, run against a folder with 46 database files adding up to 1.4GB total: ```python import pathlib, sqlite3, time

paths = list(pathlib.Path(".").glob('*.db'))

def schema_version(path): db = sqlite3.connect(path) version = db.execute("PRAGMA schema_version").fetchall()[0] db.close() return version

def all(): versions = {} for path in paths: versions[path.name] = schema_version(path) return versions

start = time.time(); all(); print(time.time() - start)

0.012346982955932617

``` So that's 12ms.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747754229 https://github.com/simonw/datasette/issues/1150#issuecomment-747754229 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzc1NDIyOQ== simonw 9599 2020-12-17T23:04:38Z 2020-12-17T23:04:38Z OWNER

Open question: will this work for hundreds of database files, or is the overhead of connecting to each of 100 databases in turn to run PRAGMA schema_version too high?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747754082 https://github.com/simonw/datasette/issues/1150#issuecomment-747754082 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzc1NDA4Mg== simonw 9599 2020-12-17T23:04:13Z 2020-12-17T23:04:13Z OWNER

Pages that need a list of all databases - the index page and /-/databases for example - could trigger a "check for new directories in the configured directories" scan.

That scan would run at most once every 5 (n) seconds - the check is triggered if it’s run more recently than that it doesn’t run.

Hopefully this means it could be done as a blocking operation, rather than trying to run it in a thread.

When it runs it scans for .db or .sqlite files (maybe one or two other extensions) that it hasn’t seen before. It also checks that the existing list of known database files still exists.

If it finds any new ones it connects to them once to run .schema. It also runs PRAGMA schema_version on each known database so that it can compare the schema version number to the last one it saw. That's how it detects if there are new tables or if the cached schema needs to be updated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1.2ms · About: github-to-sqlite