home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

14 rows where author_association = "OWNER", "created_at" is on date 2019-03-17 and user = 9599 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, body, reactions, created_at (date), updated_at (date)

issue 5

  • Default to opening files in mutable mode, special option for immutable files 6
  • URL hashing now optional: turn on with --config hash_urls:1 (#418) 3
  • Hashed URLs should be optional 2
  • Fix all the places that currently use .inspect() data 2
  • Documentation for ?_hash=1 and Datasette's hashed URL caching 1

user 1

  • simonw · 14 ✖

author_association 1

  • OWNER · 14 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
473726527 https://github.com/simonw/datasette/issues/419#issuecomment-473726527 https://api.github.com/repos/simonw/datasette/issues/419 MDEyOklzc3VlQ29tbWVudDQ3MzcyNjUyNw== simonw 9599 2019-03-17T23:28:41Z 2019-05-16T14:54:50Z OWNER

I've added the -i option, so this now works:

datasette -i fixtures.db

This feature is incomplete though. Some extra changes I need to make:

  • The ?_hash=1 and --config hash_urls:1 options (introduced in #418) should only work for immutable databases #471
  • Would be useful if there was a debug screen that could show which databases were mounted as mutable v.s. immutable - maybe a /-/databases page? - #470
  • Need to rework how .inspect() works, see #420
  • Documentation is needed #421
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Default to opening files in mutable mode, special option for immutable files 421551434  
473708724 https://github.com/simonw/datasette/issues/419#issuecomment-473708724 https://api.github.com/repos/simonw/datasette/issues/419 MDEyOklzc3VlQ29tbWVudDQ3MzcwODcyNA== simonw 9599 2019-03-17T19:55:21Z 2019-05-16T03:35:59Z OWNER

Thinking about this further: I think I may have made a mistake establishing "immutable" as the default mode for databases opened by Datasette.

What would it look like if files were NOT opened in immutable mode by default?

Maybe the command to start Datasette looks like this:

datasette mutable1.db mutable2.db --immutable=this_is_immutable.db --immutable=this_is_immutable2.db

So regular file arguments are treated as mutable (and opened in ?mode=ro) while file arguments passed using the new --immutable option are opened in immutable mode.

The -i shortcut has not yet been taken, so this could be abbreviated to:

datasette mutable1.db mutable2.db -i this_is_immutable.db -i this_is_immutable2.db
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Default to opening files in mutable mode, special option for immutable files 421551434  
473726619 https://github.com/simonw/datasette/issues/421#issuecomment-473726619 https://api.github.com/repos/simonw/datasette/issues/421 MDEyOklzc3VlQ29tbWVudDQ3MzcyNjYxOQ== simonw 9599 2019-03-17T23:29:47Z 2019-03-17T23:29:47Z OWNER

Needed for #419

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Documentation for ?_hash=1 and Datasette's hashed URL caching 421985685  
473726587 https://github.com/simonw/datasette/issues/420#issuecomment-473726587 https://api.github.com/repos/simonw/datasette/issues/420 MDEyOklzc3VlQ29tbWVudDQ3MzcyNjU4Nw== simonw 9599 2019-03-17T23:29:22Z 2019-03-17T23:29:22Z OWNER

Needed for #419

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Fix all the places that currently use .inspect() data 421971339  
473724868 https://github.com/simonw/datasette/issues/418#issuecomment-473724868 https://api.github.com/repos/simonw/datasette/issues/418 MDEyOklzc3VlQ29tbWVudDQ3MzcyNDg2OA== simonw 9599 2019-03-17T23:07:31Z 2019-03-17T23:07:31Z OWNER

The design of this feature is discussed extensively in the comments on pull request #416

Some demos:

  • https://latest.datasette.io/fixtures/facetable now no longer redirects to the hash
  • https://latest.datasette.io/fixtures/facetable?_hash=1 redirects to https://latest.datasette.io/fixtures-dd88475/facetable

``` ~ $ curl -i 'https://latest.datasette.io/fixtures-dd88475/facetable' HTTP/2 200 date: Sun, 17 Mar 2019 23:05:21 GMT content-type: text/html; charset=utf-8 content-length: 17555 cache-control: max-age=31536000

</html>~ $ curl -i 'https://latest.datasette.io/fixtures/facetable' HTTP/2 200 date: Sun, 17 Mar 2019 23:05:40 GMT content-type: text/html; charset=utf-8 content-length: 17410 cache-control: max-age=5 ```

There are now three config settings relevant to the above:

default_cache_ttl - defaults to 5s. The default cache TTL for non-hashed resources. default_cache_ttl_hashed - defaults to 31536000s. The default cache TTL for hashed resources. hash_urls - defaults to False. If True, all URLs will attempt to redirect to their hashed version.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Hashed URLs should be optional 421548881  
473717052 https://github.com/simonw/datasette/pull/416#issuecomment-473717052 https://api.github.com/repos/simonw/datasette/issues/416 MDEyOklzc3VlQ29tbWVudDQ3MzcxNzA1Mg== simonw 9599 2019-03-17T21:32:24Z 2019-03-17T21:33:16Z OWNER

Since this feature is now controlled by a config setting, I'm inclined to make it also available via a URL parameter.

If you hit this URL:

/fixtures/table.json?_hash=1

We can redirect to:

/fixtures-c2342/table.json

In this way developers can opt-in to a hashed (and hence far-future cached) response on a per-query basis.

This option won't be available against mutable databases though, which are coming in #419

This means that the hash_urls:1 config basically has the effect of assuming ?_hash=1 on all URLs to mutable databases.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
URL hashing now optional: turn on with --config hash_urls:1 (#418) 421348146  
473715254 https://github.com/simonw/datasette/pull/416#issuecomment-473715254 https://api.github.com/repos/simonw/datasette/issues/416 MDEyOklzc3VlQ29tbWVudDQ3MzcxNTI1NA== simonw 9599 2019-03-17T21:11:37Z 2019-03-17T21:11:37Z OWNER

The code for this has got a bit tricky. I need to make a decision at some point as to if the current request is a hashed_url request (if it includes a DB hash in the URL which is the current correct hash). I then need to be able to use that fact to decide which default TTL value to apply when returning the response.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
URL hashing now optional: turn on with --config hash_urls:1 (#418) 421348146  
473714545 https://github.com/simonw/datasette/pull/416#issuecomment-473714545 https://api.github.com/repos/simonw/datasette/issues/416 MDEyOklzc3VlQ29tbWVudDQ3MzcxNDU0NQ== simonw 9599 2019-03-17T21:03:08Z 2019-03-17T21:04:17Z OWNER

I'm going to introduce a new config setting: default_cache_ttl_hashed - and set the default value for default_cache_ttl to 10s (to protect against dog-piling).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
URL hashing now optional: turn on with --config hash_urls:1 (#418) 421348146  
473713946 https://github.com/simonw/datasette/issues/420#issuecomment-473713946 https://api.github.com/repos/simonw/datasette/issues/420 MDEyOklzc3VlQ29tbWVudDQ3MzcxMzk0Ng== simonw 9599 2019-03-17T20:56:38Z 2019-03-17T20:58:17Z OWNER

Some examples:

https://github.com/simonw/datasette/blob/1f54e092306b208125f39d06712b02895eb75168/datasette/views/table.py#L34-L40

https://github.com/simonw/datasette/blob/1f54e092306b208125f39d06712b02895eb75168/datasette/views/table.py#L45-L48

https://github.com/simonw/datasette/blob/1f54e092306b208125f39d06712b02895eb75168/datasette/views/table.py#L62-L65

https://github.com/simonw/datasette/blob/1f54e092306b208125f39d06712b02895eb75168/datasette/views/table.py#L112-L123

https://github.com/simonw/datasette/blob/1f54e092306b208125f39d06712b02895eb75168/datasette/views/index.py#L11-L19

https://github.com/simonw/datasette/blob/afe9aa3ae03c485c5d6652741438d09445a486c1/datasette/views/base.py#L143-L147

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Fix all the places that currently use .inspect() data 421971339  
473713363 https://github.com/simonw/datasette/issues/419#issuecomment-473713363 https://api.github.com/repos/simonw/datasette/issues/419 MDEyOklzc3VlQ29tbWVudDQ3MzcxMzM2Mw== simonw 9599 2019-03-17T20:49:39Z 2019-03-17T20:52:46Z OWNER

And a really important difference: the whole model of caching inspect data no longer works for mutable files, because another process might make a change to the database schema (adding a new table for example).

https://fivethirtyeight.datasettes.com/-/inspect

So everywhere that uses self.ds.inspect() right now will have to change to calling a routine which knows the difference between mutable and immutable databases and queries for live schema data for mutables while using a cache for immutables.

I'll track this as a separate ticket.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Default to opening files in mutable mode, special option for immutable files 421551434  
473712820 https://github.com/simonw/datasette/issues/419#issuecomment-473712820 https://api.github.com/repos/simonw/datasette/issues/419 MDEyOklzc3VlQ29tbWVudDQ3MzcxMjgyMA== simonw 9599 2019-03-17T20:43:23Z 2019-03-17T20:43:51Z OWNER

So the differences here are:

  • For immutable databases we calculate content hash and table counts; mutable databases we do not
  • Immutable databasse open with file:{}?immutable=1, mutable databases open with file:{}?mode=ro
  • Anywhere that shows a table count now needs to call a new method which knows to run count(*) with a timeout for mutable databases, read from the precalculated counts for immutable databases
  • The url-hash option should no longer be available at all for mutable databases
  • New command-line tool syntax: datasette mutable.db v.s. datasette -i immutable.db
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Default to opening files in mutable mode, special option for immutable files 421551434  
473709883 https://github.com/simonw/datasette/issues/419#issuecomment-473709883 https://api.github.com/repos/simonw/datasette/issues/419 MDEyOklzc3VlQ29tbWVudDQ3MzcwOTg4Mw== simonw 9599 2019-03-17T20:09:47Z 2019-03-17T20:37:45Z OWNER

Could I persist the last calculated count for a table and somehow detect if that table has been changed in any way by another process, hence invalidating the cached count (and potentially scheduling a new count)?

https://www.sqlite.org/c3ref/update_hook.html says that sqlite3_update_hook() can be used to register a handler invoked on almost all update/insert/delete operations to a specific table... except that it misses out on deletes triggered by ON CONFLICT REPLACE and only works for ROWID tables.

Also this hook is not exposed in the Python sqlite3 library - though it may be available using some terrifying ctypes hacks: https://stackoverflow.com/a/16920926

So on further research, I think the answer is no: I should assume that it won't be possible to cache counts and magically invalidate the cache when the underlying file is changed by another process.

Instead I need to assume that counts will be an expensive operation.

As such, I can introduce a time limit on counts and use that anywhere a count is displayed. If the time limit is exceeded by the count(*) query I can show "many" instead.

That said... running count(*) against a table with 200,000 rows in only takes about 3ms, so even a timeout of 20ms is likely to work fine for tables of around a million rows.

It would be really neat if I could generate a lower bound count in a limited amount of time. If I counted up to 4m rows before the timeout I could show "more than 4m rows". No idea if that would be possible though.

Relevant: https://stackoverflow.com/questions/8988915/sqlite-count-slow-on-big-tables - reports of very slow counts on 6GB database file. Consensus seems to be "yeah, that's just how SQLite is built" - though there was a suggestion that you can use select max(ROWID) from table provided you are certain there have been no deletions.

Also relevant: http://sqlite.1065341.n5.nabble.com/sqlite3-performance-on-select-count-very-slow-for-16-GB-file-td80176.html

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Default to opening files in mutable mode, special option for immutable files 421551434  
473709815 https://github.com/simonw/datasette/issues/418#issuecomment-473709815 https://api.github.com/repos/simonw/datasette/issues/418 MDEyOklzc3VlQ29tbWVudDQ3MzcwOTgxNQ== simonw 9599 2019-03-17T20:08:31Z 2019-03-17T20:08:31Z OWNER

In #419 I'm now proposing that Datasette default to opening files in "mutable" mode, in which case it would not make sense to support hash URLs for those files at all. So actually this feature will only be available for files that are explicitly opened in immutable mode.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Hashed URLs should be optional 421548881  
473708941 https://github.com/simonw/datasette/issues/419#issuecomment-473708941 https://api.github.com/repos/simonw/datasette/issues/419 MDEyOklzc3VlQ29tbWVudDQ3MzcwODk0MQ== simonw 9599 2019-03-17T19:58:11Z 2019-03-17T19:58:11Z OWNER

Some problems to solve:

  • Right now Datasette assumes it can always show the count of rows in a table, because this has been pre-calculated. If a database is mutable the pre-calculation trick no longer works, and for giant tables a select count(*) from X query can be expensive to run. Maybe we set a time limit on these? If time limit expires show "many rows"?
  • Maintaining a content hash of the table no longer makes sense if it is changing (though interestingly there's a .sha3sum built-in SQLite CLI command which takes a hash of the content and stays the same even through vacuum runs). Without that we need a different mechanism for calculating table colours. It also means that we can't do the special dbname-hash URL trick (see #418) at all if the database is opened as mutable.
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Default to opening files in mutable mode, special option for immutable files 421551434  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 516.393ms · About: github-to-sqlite