home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

21 rows where "created_at" is on date 2020-12-17 and user = 9599 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 7

  • Maintain an in-memory SQLite table of connected databases and their tables 8
  • Database class mechanism for cross-connection in-memory databases 7
  • Make it easier to theme Datasette with CSS 2
  • Paginate + search for databases/tables on the homepage 1
  • Replace "datasette publish --extra-options" with "--setting" 1
  • Remove xfail tests when new httpx is released 1
  • killed by oomkiller on large location-history 1

author_association 2

  • OWNER 20
  • MEMBER 1

user 1

  • simonw · 21 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
747775245 https://github.com/simonw/datasette/issues/1151#issuecomment-747775245 https://api.github.com/repos/simonw/datasette/issues/1151 MDEyOklzc3VlQ29tbWVudDc0Nzc3NTI0NQ== simonw 9599 2020-12-17T23:43:41Z 2020-12-17T23:56:27Z OWNER

I'm going to add an argument to the Database() constructor which means "connect to named in-memory database called X".

python db = Database(ds, memory_name="datasette")

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database class mechanism for cross-connection in-memory databases 770448622  
747779056 https://github.com/simonw/datasette/issues/1151#issuecomment-747779056 https://api.github.com/repos/simonw/datasette/issues/1151 MDEyOklzc3VlQ29tbWVudDc0Nzc3OTA1Ng== simonw 9599 2020-12-17T23:55:57Z 2020-12-17T23:55:57Z OWNER

Wait I do use it - if you run datasette --memory - which is useful for trying things out in SQL that doesn't need to run against a table.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database class mechanism for cross-connection in-memory databases 770448622  
747775792 https://github.com/simonw/datasette/issues/1151#issuecomment-747775792 https://api.github.com/repos/simonw/datasette/issues/1151 MDEyOklzc3VlQ29tbWVudDc0Nzc3NTc5Mg== simonw 9599 2020-12-17T23:45:20Z 2020-12-17T23:45:20Z OWNER

Do I use the current is_memory= boolean anywhere at the moment?

https://ripgrep.datasette.io/-/ripgrep?pattern=is_memory - doesn't look like it.

I may remove that feature, since it's not actually useful, and replace it with a mechanism for creating shared named memory databases instead.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database class mechanism for cross-connection in-memory databases 770448622  
747774855 https://github.com/simonw/datasette/issues/1151#issuecomment-747774855 https://api.github.com/repos/simonw/datasette/issues/1151 MDEyOklzc3VlQ29tbWVudDc0Nzc3NDg1NQ== simonw 9599 2020-12-17T23:42:34Z 2020-12-17T23:42:34Z OWNER

This worked as a prototype: ```diff diff --git a/datasette/database.py b/datasette/database.py index 412e0c5..a90e617 100644 --- a/datasette/database.py +++ b/datasette/database.py @@ -24,11 +24,12 @@ connections = threading.local()

class Database: - def init(self, ds, path=None, is_mutable=False, is_memory=False): + def init(self, ds, path=None, is_mutable=False, is_memory=False, uri=None): self.ds = ds self.path = path self.is_mutable = is_mutable self.is_memory = is_memory + self.uri = uri self.hash = None self.cached_size = None self.cached_table_counts = None @@ -46,6 +47,8 @@ class Database: }

 def connect(self, write=False):
  • if self.uri:
  • return sqlite3.connect(self.uri, uri=True, check_same_thread=False) if self.is_memory: return sqlite3.connect(":memory:") # mode=ro or immutable=1? Then in `ipython`: from datasette.app import Datasette from datasette.database import Database ds = Datasette([]) db = Database(ds, uri="file:datasette?mode=memory&cache=shared", is_memory=True) await db.execute_write("create table foo (bar text)") await db.table_names()

Outputs ["foo"]

db2 = Database(ds, uri="file:datasette?mode=memory&cache=shared", is_memory=True) await db2.table_names()

Also outputs ["foo"]

```

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database class mechanism for cross-connection in-memory databases 770448622  
747770581 https://github.com/simonw/datasette/issues/1151#issuecomment-747770581 https://api.github.com/repos/simonw/datasette/issues/1151 MDEyOklzc3VlQ29tbWVudDc0Nzc3MDU4MQ== simonw 9599 2020-12-17T23:31:18Z 2020-12-17T23:32:07Z OWNER

This works in ipython: ``` In [1]: import sqlite3

In [2]: c1 = sqlite3.connect("file:datasette?mode=memory&cache=shared", uri=True)

In [3]: c2 = sqlite3.connect("file:datasette?mode=memory&cache=shared", uri=True)

In [4]: c1.executescript("CREATE TABLE hello (world TEXT)") Out[4]: <sqlite3.Cursor at 0x1104addc0>

In [5]: c1.execute("select * from sqlite_master").fetchall() Out[5]: [('table', 'hello', 'hello', 2, 'CREATE TABLE hello (world TEXT)')]

In [6]: c2.execute("select * from sqlite_master").fetchall() Out[6]: [('table', 'hello', 'hello', 2, 'CREATE TABLE hello (world TEXT)')]

In [7]: c3 = sqlite3.connect("file:datasette?mode=memory&cache=shared", uri=True)

In [9]: c3.execute("select * from sqlite_master").fetchall() Out[9]: [('table', 'hello', 'hello', 2, 'CREATE TABLE hello (world TEXT)')]

In [10]: c4 = sqlite3.connect("file:datasette?mode=memory", uri=True)

In [11]: c4.execute("select * from sqlite_master").fetchall() Out[11]: [] ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database class mechanism for cross-connection in-memory databases 770448622  
747770082 https://github.com/simonw/datasette/issues/1151#issuecomment-747770082 https://api.github.com/repos/simonw/datasette/issues/1151 MDEyOklzc3VlQ29tbWVudDc0Nzc3MDA4Mg== simonw 9599 2020-12-17T23:29:53Z 2020-12-17T23:29:53Z OWNER

I'm going to try with file:datasette?mode=memory&cache=shared.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database class mechanism for cross-connection in-memory databases 770448622  
747769830 https://github.com/simonw/datasette/issues/1151#issuecomment-747769830 https://api.github.com/repos/simonw/datasette/issues/1151 MDEyOklzc3VlQ29tbWVudDc0Nzc2OTgzMA== simonw 9599 2020-12-17T23:29:08Z 2020-12-17T23:29:08Z OWNER

https://sqlite.org/inmemorydb.html

The database ceases to exist as soon as the database connection is closed. Every :memory: database is distinct from every other. So, opening two database connections each with the filename ":memory:" will create two independent in-memory databases.

[...]

The special ":memory:" filename also works when using URI filenames. For example:

 rc = sqlite3_open("file::memory:", &db);

[...]

However, the same in-memory database can be opened by two or more database connections as follows:

 rc = sqlite3_open("file::memory:?cache=shared", &db);

[...] If two or more distinct but shareable in-memory databases are needed in a single process, then the mode=memory query parameter can be used with a URI filename to create a named in-memory database:

rc = sqlite3_open("file:memdb1?mode=memory&cache=shared", &db);
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database class mechanism for cross-connection in-memory databases 770448622  
747768112 https://github.com/simonw/datasette/issues/1150#issuecomment-747768112 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzc2ODExMg== simonw 9599 2020-12-17T23:25:21Z 2020-12-17T23:25:21Z OWNER

Next challenge: figure out how to use the Database class from https://github.com/simonw/datasette/blob/0.53/datasette/database.py for an in-memory database which persists data for the duration of the lifetime of the server, and allows access to that in-memory database from multiple threads in a way that lets them see each other's changes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747767598 https://github.com/simonw/datasette/issues/1150#issuecomment-747767598 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzc2NzU5OA== simonw 9599 2020-12-17T23:24:03Z 2020-12-17T23:24:03Z OWNER

I'm going to assume that even the heaviest user will have trouble going beyond a few hundred database files, so this is fine.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747767499 https://github.com/simonw/datasette/issues/1150#issuecomment-747767499 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzc2NzQ5OQ== simonw 9599 2020-12-17T23:23:44Z 2020-12-17T23:23:44Z OWNER

Grabbing the schema version of 380 files in the root directory takes 70ms.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747767055 https://github.com/simonw/datasette/issues/1150#issuecomment-747767055 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzc2NzA1NQ== simonw 9599 2020-12-17T23:22:41Z 2020-12-17T23:22:41Z OWNER

It's just recursion that's expensive. I created 380 empty SQLite databases in a folder and timed list(pathlib.Path("/tmp").glob("*.db")); and it took 0.002s.

So maybe I tell users that all SQLite databases have to be in the root folder.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747766310 https://github.com/simonw/datasette/issues/1150#issuecomment-747766310 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzc2NjMxMA== simonw 9599 2020-12-17T23:20:49Z 2020-12-17T23:20:49Z OWNER

I tried against my entire ~/Development/Dropbox folder - deeply nested with 381 SQLite database files in sub-folders - and it took 25s! But it turned out 23.9s of that was the call to pathlib.Path("/Users/simon/Dropbox/Development").glob('**/*.db').

So it looks like connecting to a SQLite database file and getting the schema version is extremely fast. Scanning directories is slower.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747764712 https://github.com/simonw/datasette/issues/1150#issuecomment-747764712 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzc2NDcxMg== simonw 9599 2020-12-17T23:16:31Z 2020-12-17T23:16:31Z OWNER

Quick micro-benchmark, run against a folder with 46 database files adding up to 1.4GB total: ```python import pathlib, sqlite3, time

paths = list(pathlib.Path(".").glob('*.db'))

def schema_version(path): db = sqlite3.connect(path) version = db.execute("PRAGMA schema_version").fetchall()[0] db.close() return version

def all(): versions = {} for path in paths: versions[path.name] = schema_version(path) return versions

start = time.time(); all(); print(time.time() - start)

0.012346982955932617

``` So that's 12ms.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747754229 https://github.com/simonw/datasette/issues/1150#issuecomment-747754229 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzc1NDIyOQ== simonw 9599 2020-12-17T23:04:38Z 2020-12-17T23:04:38Z OWNER

Open question: will this work for hundreds of database files, or is the overhead of connecting to each of 100 databases in turn to run PRAGMA schema_version too high?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747754082 https://github.com/simonw/datasette/issues/1150#issuecomment-747754082 https://api.github.com/repos/simonw/datasette/issues/1150 MDEyOklzc3VlQ29tbWVudDc0Nzc1NDA4Mg== simonw 9599 2020-12-17T23:04:13Z 2020-12-17T23:04:13Z OWNER

Pages that need a list of all databases - the index page and /-/databases for example - could trigger a "check for new directories in the configured directories" scan.

That scan would run at most once every 5 (n) seconds - the check is triggered if it’s run more recently than that it doesn’t run.

Hopefully this means it could be done as a blocking operation, rather than trying to run it in a thread.

When it runs it scans for .db or .sqlite files (maybe one or two other extensions) that it hasn’t seen before. It also checks that the existing list of known database files still exists.

If it finds any new ones it connects to them once to run .schema. It also runs PRAGMA schema_version on each known database so that it can compare the schema version number to the last one it saw. That's how it detects if there are new tables or if the cached schema needs to be updated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Maintain an in-memory SQLite table of connected databases and their tables 770436876  
747734273 https://github.com/simonw/datasette/issues/461#issuecomment-747734273 https://api.github.com/repos/simonw/datasette/issues/461 MDEyOklzc3VlQ29tbWVudDc0NzczNDI3Mw== simonw 9599 2020-12-17T22:14:46Z 2020-12-17T22:14:46Z OWNER

I've been thinking about this a bunch. For Datasette to be useful as a private repository of data (Datasette Library, #417) it's crucial that it can handle a much, much larger number of databases.

This makes me worry about how many connections (and open file handles) it makes sense to have open at one time.

I realize now that this is much less of a problem for private instances. Public instances on the internet could get traffic to any database at any time, so connections could easily get out of control. A private instance with only a few users could instead get away with only opening connections to databases in "active use".

This does however make it even more important for Datasette to maintain a cached set of metadata about the tables - which is also needed to power this feature.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Paginate + search for databases/tables on the homepage 443021509  
747209115 https://github.com/simonw/datasette/issues/1005#issuecomment-747209115 https://api.github.com/repos/simonw/datasette/issues/1005 MDEyOklzc3VlQ29tbWVudDc0NzIwOTExNQ== simonw 9599 2020-12-17T05:11:04Z 2020-12-17T05:11:04Z OWNER

Tracking ticket for the next HTTPX release is https://github.com/encode/httpx/pull/1403

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Remove xfail tests when new httpx is released 718259202  
747208543 https://github.com/simonw/datasette/issues/741#issuecomment-747208543 https://api.github.com/repos/simonw/datasette/issues/741 MDEyOklzc3VlQ29tbWVudDc0NzIwODU0Mw== simonw 9599 2020-12-17T05:09:03Z 2020-12-17T05:09:03Z OWNER

I really like this in datasette-publish-vercel - I'm definitely going to bring this to the other publish implementations as well.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Replace "datasette publish --extra-options" with "--setting" 607223136  
747207787 https://github.com/simonw/datasette/issues/1149#issuecomment-747207787 https://api.github.com/repos/simonw/datasette/issues/1149 MDEyOklzc3VlQ29tbWVudDc0NzIwNzc4Nw== simonw 9599 2020-12-17T05:06:16Z 2020-12-17T05:06:16Z OWNER

So, an idea: what if Datasette's default CSS applied only to elements with classes - or maybe to childen of a body class="datasette" element? In such a way that you could write your own custom HTML that reused elements of Datasette's CSS - the cog menu styling for example - but only on an opt-in basis?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Make it easier to theme Datasette with CSS 769520939  
747207487 https://github.com/simonw/datasette/issues/1149#issuecomment-747207487 https://api.github.com/repos/simonw/datasette/issues/1149 MDEyOklzc3VlQ29tbWVudDc0NzIwNzQ4Nw== simonw 9599 2020-12-17T05:05:08Z 2020-12-17T05:05:08Z OWNER

I think what I want is for it to be easy to reuse portions of Datasette's CSS - the bit that styles the cog menu for example - without pulling in the whole thing.

I tried linking in the <link rel="stylesheet" href="/-/static/app.css"> stylesheet and the page broke, wildly:

That's because Datasette's built-in CSS applies styles directly to a whole bunch of different tags - body, header, footer etc - which means that if you import that stylesheet it can play havoc with the site you have already built.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Make it easier to theme Datasette with CSS 769520939  
747126777 https://github.com/dogsheep/google-takeout-to-sqlite/issues/2#issuecomment-747126777 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/2 MDEyOklzc3VlQ29tbWVudDc0NzEyNjc3Nw== simonw 9599 2020-12-17T00:36:52Z 2020-12-17T00:36:52Z MEMBER

The memory profiler tricks I used in https://github.com/dogsheep/healthkit-to-sqlite/issues/7 could help figure out what's going on here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
killed by oomkiller on large location-history 769376447  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 534.41ms · About: github-to-sqlite