html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app https://github.com/simonw/datasette/issues/419#issuecomment-473713363,https://api.github.com/repos/simonw/datasette/issues/419,473713363,MDEyOklzc3VlQ29tbWVudDQ3MzcxMzM2Mw==,9599,2019-03-17T20:49:39Z,2019-03-17T20:52:46Z,OWNER,"And a really important difference: the whole model of caching inspect data no longer works for mutable files, because another process might make a change to the database schema (adding a new table for example). https://fivethirtyeight.datasettes.com/-/inspect So everywhere that uses `self.ds.inspect()` right now will have to change to calling a routine which knows the difference between mutable and immutable databases and queries for live schema data for mutables while using a cache for immutables. I'll track this as a separate ticket.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",421551434, https://github.com/simonw/datasette/issues/419#issuecomment-473712820,https://api.github.com/repos/simonw/datasette/issues/419,473712820,MDEyOklzc3VlQ29tbWVudDQ3MzcxMjgyMA==,9599,2019-03-17T20:43:23Z,2019-03-17T20:43:51Z,OWNER,"So the differences here are: * For immutable databases we calculate content hash and table counts; mutable databases we do not * Immutable databasse open with `file:{}?immutable=1`, mutable databases open with `file:{}?mode=ro` * Anywhere that shows a table count now needs to call a new method which knows to run `count(*)` with a timeout for mutable databases, read from the precalculated counts for immutable databases * The url-hash option should no longer be available at all for mutable databases * New command-line tool syntax: `datasette mutable.db` v.s. `datasette -i immutable.db`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",421551434, https://github.com/simonw/datasette/issues/419#issuecomment-473709883,https://api.github.com/repos/simonw/datasette/issues/419,473709883,MDEyOklzc3VlQ29tbWVudDQ3MzcwOTg4Mw==,9599,2019-03-17T20:09:47Z,2019-03-17T20:37:45Z,OWNER,"Could I persist the last calculated count for a table and somehow detect if that table has been changed in any way by another process, hence invalidating the cached count (and potentially scheduling a new count)? https://www.sqlite.org/c3ref/update_hook.html says that `sqlite3_update_hook()` can be used to register a handler invoked on almost all update/insert/delete operations to a specific table... except that it misses out on deletes triggered by `ON CONFLICT REPLACE` and only works for `ROWID` tables. Also this hook is not exposed in the Python `sqlite3` library - though it may be available using some terrifying `ctypes` hacks: https://stackoverflow.com/a/16920926 So on further research, I think the answer is *no*: I should assume that it won't be possible to cache counts and magically invalidate the cache when the underlying file is changed by another process. Instead I need to assume that counts will be an expensive operation. As such, I can introduce a time limit on counts and use that anywhere a count is displayed. If the time limit is exceeded by the `count(*)` query I can show ""many"" instead. That said... running `count(*)` against a table with 200,000 rows in only takes about 3ms, so even a timeout of 20ms is likely to work fine for tables of around a million rows. It would be really neat if I could generate a lower bound count in a limited amount of time. If I counted up to 4m rows before the timeout I could show ""more than 4m rows"". No idea if that would be possible though. Relevant: https://stackoverflow.com/questions/8988915/sqlite-count-slow-on-big-tables - reports of very slow counts on 6GB database file. Consensus seems to be ""yeah, that's just how SQLite is built"" - though there was a suggestion that you can use `select max(ROWID) from table` provided you are certain there have been no deletions. Also relevant: http://sqlite.1065341.n5.nabble.com/sqlite3-performance-on-select-count-very-slow-for-16-GB-file-td80176.html","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",421551434, https://github.com/simonw/datasette/issues/419#issuecomment-473708941,https://api.github.com/repos/simonw/datasette/issues/419,473708941,MDEyOklzc3VlQ29tbWVudDQ3MzcwODk0MQ==,9599,2019-03-17T19:58:11Z,2019-03-17T19:58:11Z,OWNER,"Some problems to solve: * Right now Datasette assumes it can always show the count of rows in a table, because this has been pre-calculated. If a database is mutable the pre-calculation trick no longer works, and for giant tables a `select count(*) from X` query can be expensive to run. Maybe we set a time limit on these? If time limit expires show ""many rows""? * Maintaining a content hash of the table no longer makes sense if it is changing (though interestingly there's a `.sha3sum` built-in SQLite CLI command which takes a hash of the content and stays the same even through vacuum runs). Without that we need a different mechanism for calculating table colours. It also means that we can't do the special dbname-hash URL trick (see #418) at all if the database is opened as mutable.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",421551434,