github
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/simonw/sqlite-utils/issues/235#issuecomment-1504288134 | https://api.github.com/repos/simonw/sqlite-utils/issues/235 | 1504288134 | IC_kwDOCGYnMM5ZqZ2G | 9599 | 2023-04-11T23:55:06Z | 2023-04-12T03:34:32Z | OWNER | Also checked the official Datasette Docker image - I had to run that in Codespaces because it doesn't currently work on my M2 Mac: ``` codespace@codespaces-112c61:/workspaces/sqlite-utils$ docker pull datasetteproject/datasette Using default tag: latest ... codespace@codespaces-112c61:/workspaces/sqlite-utils$ docker run -it datasetteproject/datasette / bin/bash root@75ba34f501ec:/# python Python 3.11.0 (main, Dec 6 2022, 13:31:55) [GCC 10.2.1 20210110] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sqlite3 .executescript(""" PRAGMA writable_schema = 1; UPDATE sqlite_master SET sql = 'CREATE TABLE [foos] (id integer primary key)'; PRAGMA writable_schema = 0; """)>>> db = sqlite3.connect(":memory:") >>> db.executescript(""" ... PRAGMA writable_schema = 1; ... UPDATE sqlite_master SET sql = 'CREATE TABLE [foos] (id integer primary key)'; ... PRAGMA writable_schema = 0; ... """) <sqlite3.Cursor object at 0x7fd9b0561140> >>> ``` So that confirms that the official image also has a Python with a SQLite that's not in defensive mode. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
810618495 | |
https://github.com/simonw/datasette/issues/2058#issuecomment-1504426792 | https://api.github.com/repos/simonw/datasette/issues/2058 | 1504426792 | IC_kwDOBm6k_c5Zq7so | 9599 | 2023-04-12T02:02:42Z | 2023-04-12T02:02:42Z | OWNER | I tightened up the benchmark (it was measuring the time taken to create the tables too) and got this: ![image](https://user-images.githubusercontent.com/9599/231328328-85ca35ac-a11b-46f4-b132-dae367103570.png) | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1663399821 | |
https://github.com/simonw/datasette/issues/2058#issuecomment-1504328395 | https://api.github.com/repos/simonw/datasette/issues/2058 | 1504328395 | IC_kwDOBm6k_c5ZqjrL | 9599 | 2023-04-12T00:28:38Z | 2023-04-12T00:28:38Z | OWNER | Here's a much better chart, which shows that MD5 performance unsurprisingly gets worse as the number of tables increases while `schema_version` remains constant: ![image](https://user-images.githubusercontent.com/9599/231316778-513bd99f-5ea4-495c-b86d-c572a7106369.png) | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1663399821 | |
https://github.com/simonw/datasette/issues/2058#issuecomment-1504315697 | https://api.github.com/repos/simonw/datasette/issues/2058 | 1504315697 | IC_kwDOBm6k_c5Zqgkx | 9599 | 2023-04-12T00:16:22Z | 2023-04-12T00:27:12Z | OWNER | I got ChatGPT (code execution alpha) to run a micro-benchmark for me. This was the conclusion: > The benchmark using `PRAGMA schema_version` is approximately 1.36 times faster than the benchmark using `hashlib.md5` for the case with 100 tables. For the case with 200 tables, the benchmark using `PRAGMA schema_version` is approximately 2.33 times faster than the benchmark using `hashlib.md5`. Here's the chart it drew me: ![image](https://user-images.githubusercontent.com/9599/231315366-3a12b6d3-08d7-419d-a1fd-36eb24da0d85.png) (It's a pretty rubbish chart though, it only took measurements at 100 and 200 and drew a line between the two, I should have told it to measure every 10 and plot that) And the full transcript: https://gist.github.com/simonw/2fc46effbfbe49e6de0bcfdc9e31b235 The benchmark looks good enough on first glance that I don't feel the need to be more thorough with it. `PRAGMA schema_version` is faster, but not so fast that I feel like the MD5 hack is worth worrying about too much. I'm tempted to add something to the `/-/versions` page that tries to identify if this is a problem or not though. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1663399821 | |
https://github.com/simonw/datasette/issues/2058#issuecomment-1504298448 | https://api.github.com/repos/simonw/datasette/issues/2058 | 1504298448 | IC_kwDOBm6k_c5ZqcXQ | 9599 | 2023-04-12T00:04:01Z | 2023-04-12T00:04:01Z | OWNER | Here's a potential workaround: when I store the schema versions, I could also score an MD5 hash of the full schema (`select group_concat(sql) from sqlite_master`). When I read the schema version with `PRAGMA schema_version` I could catch that exception and, if I see it, I could calculate that MD5 hash again as a fallback and use that to determine if the schema has changed instead. The performance overhead of this needs investigating - how much more expensive is `md5(... that SQL query result)` compared to just `PRAGMA schema_version`, especially on a database with a lot of tables? | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1663399821 | |
https://github.com/simonw/datasette/issues/2058#issuecomment-1504295345 | https://api.github.com/repos/simonw/datasette/issues/2058 | 1504295345 | IC_kwDOBm6k_c5Zqbmx | 9599 | 2023-04-12T00:01:42Z | 2023-04-12T00:02:26Z | OWNER | Here's the relevant code: https://github.com/simonw/datasette/blob/5890a20c374fb0812d88c9b0ef26a838bfa06c76/datasette/app.py#L421-L437 This function is called on almost every request (everything that subclasses `BaseView` at least - need to remember that for the refactor in #2053 etc). https://github.com/simonw/datasette/blob/5890a20c374fb0812d88c9b0ef26a838bfa06c76/datasette/views/base.py#L101-L103 It uses `PRAGMA schema_version` as a cheap way to determine if the schema has changed, in which case it needs to refresh the internal schema tables. This was already the cause of a subtle bug here: - #1231 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1663399821 |