{"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-804347152", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 804347152, "node_id": "MDEyOklzc3VlQ29tbWVudDgwNDM0NzE1Mg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T19:47:56Z", "updated_at": "2021-03-22T19:48:03Z", "author_association": "OWNER", "body": "I wrote a bunch of tips on creating smaller Docker images here: https://simonwillison.net/2018/Nov/19/smaller-python-docker-images/", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-804344553", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 804344553, "node_id": "MDEyOklzc3VlQ29tbWVudDgwNDM0NDU1Mw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T19:43:25Z", "updated_at": "2021-03-22T19:43:25Z", "author_association": "OWNER", "body": "Does `--no-install-recommends` make a difference?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-804338678", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 804338678, "node_id": "MDEyOklzc3VlQ29tbWVudDgwNDMzODY3OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T19:33:43Z", "updated_at": "2021-03-22T19:33:43Z", "author_association": "OWNER", "body": "Replacing `rm -rf /var/lib/{apt,dpkg,cache,log}/` with \r\n```\r\n rm -rf /var/lib/apt && \\\r\n rm -rf /var/lib/dpkg\r\n```\r\nGot the size down to 305MB.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-804318314", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 804318314, "node_id": "MDEyOklzc3VlQ29tbWVudDgwNDMxODMxNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T19:04:30Z", "updated_at": "2021-03-22T19:04:30Z", "author_association": "OWNER", "body": "Considering the image on Docker Hub right now is `383MB` this is actually an improvement.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-804317545", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 804317545, "node_id": "MDEyOklzc3VlQ29tbWVudDgwNDMxNzU0NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T19:03:22Z", "updated_at": "2021-03-22T19:03:22Z", "author_association": "OWNER", "body": "This Dockerfile:\r\n```dockerfile\r\nFROM python:3.9.2-slim-buster as build\r\n\r\n# software-properties-common provides add-apt-repository\r\nRUN apt-get update && \\\r\n apt-get -y install software-properties-common && \\\r\n add-apt-repository \"deb http://httpredir.debian.org/debian sid main\" && \\\r\n apt-get update && \\\r\n apt-get -t sid install -y libsqlite3-mod-spatialite && \\\r\n apt clean && \\\r\n rm -rf /var/lib/{apt,dpkg,cache,log}/\r\n\r\nRUN pip install datasette\r\n\r\nEXPOSE 8001\r\nCMD [\"datasette\"]\r\n```\r\nProduces a 344MB image that includes a working SpatiaLite 5.0 module. And weirdly... it doesn't exhibit the hanging bug!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-804310353", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 804310353, "node_id": "MDEyOklzc3VlQ29tbWVudDgwNDMxMDM1Mw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T18:52:12Z", "updated_at": "2021-03-22T18:52:12Z", "author_association": "OWNER", "body": "This Dockerfile:\r\n```dockerfile\r\nFROM python:3.9.2-slim-buster as build\r\n\r\n# Setup build dependencies\r\nRUN apt update \\\r\n && apt install -y python3-dev build-essential wget libxml2-dev libproj-dev \\\r\n libminizip-dev libgeos-dev libsqlite3-dev zlib1g-dev pkg-config git \\\r\n && apt clean\r\n\r\nRUN wget \"https://www.sqlite.org/2021/sqlite-autoconf-3340100.tar.gz\" && tar xzf sqlite-autoconf-3340100.tar.gz \\\r\n && cd sqlite-autoconf-3340100 && ./configure --disable-static --enable-fts5 --enable-json1 \\\r\n CFLAGS=\"-g -O2 -DSQLITE_ENABLE_FTS3=1 -DSQLITE_ENABLE_FTS3_PARENTHESIS -DSQLITE_ENABLE_FTS4=1 -DSQLITE_ENABLE_RTREE=1 -DSQLITE_ENABLE_JSON1\" \\\r\n && make && make install\r\n\r\nRUN wget \"http://www.gaia-gis.it/gaia-sins/freexl-1.0.6.tar.gz\" && tar zxf freexl-1.0.6.tar.gz \\\r\n && cd freexl-1.0.6 && ./configure && make && make install\r\n\r\nRUN wget \"http://www.gaia-gis.it/gaia-sins/libspatialite-5.0.1.tar.gz\" && tar zxf libspatialite-5.0.1.tar.gz \\\r\n && cd libspatialite-5.0.1 && ./configure --disable-rttopo && make && make install\r\n\r\nRUN wget \"http://www.gaia-gis.it/gaia-sins/readosm-sources/readosm-1.1.0.tar.gz\" && tar zxf readosm-1.1.0.tar.gz && cd readosm-1.1.0 && ./configure && make && make install\r\n\r\nRUN wget \"http://www.gaia-gis.it/gaia-sins/spatialite-tools-5.0.0.tar.gz\" && tar zxf spatialite-tools-5.0.0.tar.gz \\\r\n && cd spatialite-tools-5.0.0 && ./configure --disable-rttopo && make && make install\r\n\r\n# Add local code to the image instead of fetching from pypi.\r\n#COPY . /datasette\r\n#RUN pip install /datasette\r\nRUN pip install datasette\r\n\r\nFROM python:3.9.2-slim-buster\r\n\r\n# Copy python dependencies and spatialite libraries\r\nCOPY --from=build /usr/local/lib/ /usr/local/lib/\r\n# Copy executables\r\nCOPY --from=build /usr/local/bin /usr/local/bin\r\n# Copy spatial extensions\r\nCOPY --from=build /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu\r\n\r\nENV LD_LIBRARY_PATH=/usr/local/lib\r\n\r\nEXPOSE 8001\r\nCMD [\"datasette\"]\r\n```\r\nProduced a 448MB image.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-804309510", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 804309510, "node_id": "MDEyOklzc3VlQ29tbWVudDgwNDMwOTUxMA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T18:50:50Z", "updated_at": "2021-03-22T18:50:50Z", "author_association": "OWNER", "body": "Ideally I'd like to use the Debian stable `python:3.9.2-slim-buster` base image but install SpatiaLite from Debian unstable here: https://packages.debian.org/sid/libspatialite7\r\n\r\nThis pattern might let me do that: https://github.com/helmesjo/cpp_bash_utils/blob/f031e926249f8e2d7f260f22dc8974c6d5be11fe/docker/images/linux-gcc.dockerfile#L20-L24", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1271#issuecomment-804265042", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1271", "id": 804265042, "node_id": "MDEyOklzc3VlQ29tbWVudDgwNDI2NTA0Mg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T17:45:45Z", "updated_at": "2021-03-22T17:45:45Z", "author_association": "OWNER", "body": "I can remove this code too: \r\nhttps://github.com/simonw/datasette/blob/6f41c8a2bef309a66588b2875c3e24d26adb4850/datasette/database.py#L190-L192", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837956424, "label": "Use SQLite conn.interrupt() instead of sqlite_timelimit()"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-804263434", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 804263434, "node_id": "MDEyOklzc3VlQ29tbWVudDgwNDI2MzQzNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T17:43:25Z", "updated_at": "2021-03-22T17:43:25Z", "author_association": "OWNER", "body": "I figured out the cause of the hang in #1268 - it was caused by `select count(*) from SpatialIndex` interacting badly with the `set_progress_handler()` mechanism I was using to implement query time limits. #1271 has a replacement for that using `asyncio.wait_for()` and `conn.interrupt()` which should resolve the SpatiaLite issue too.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-804261915", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 804261915, "node_id": "MDEyOklzc3VlQ29tbWVudDgwNDI2MTkxNQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T17:41:12Z", "updated_at": "2021-03-22T17:41:12Z", "author_association": "OWNER", "body": "Closing this because I've figured out the root of the problem now, and I have a potential solution.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1269#issuecomment-804261610", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1269", "id": 804261610, "node_id": "MDEyOklzc3VlQ29tbWVudDgwNDI2MTYxMA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T17:40:41Z", "updated_at": "2021-03-22T17:40:41Z", "author_association": "OWNER", "body": "#1270 looks promising, and I don't want to leave open a security hole where someone could potentially hang Datasette with a nasty `count(*)` query.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837348479, "label": "Don't attempt to run count(*) against virtual tables"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1270#issuecomment-804255633", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1270", "id": 804255633, "node_id": "MDEyOklzc3VlQ29tbWVudDgwNDI1NTYzMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T17:32:02Z", "updated_at": "2021-03-22T17:32:08Z", "author_association": "OWNER", "body": "Confirmed that the `interrupt()` based cancellation mechanism fixes the SpatiaLite issue in #1268!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837350092, "label": "Try implementing SQLite timeouts using .interrupt() instead of using .set_progress_handler()"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1270#issuecomment-803834784", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1270", "id": 803834784, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzgzNDc4NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T07:31:57Z", "updated_at": "2021-03-22T16:22:19Z", "author_association": "OWNER", "body": "I think the implementation for this goes here: https://github.com/simonw/datasette/blob/6f41c8a2bef309a66588b2875c3e24d26adb4850/datasette/database.py#L146-L157\r\n\r\nI figured out a similar pattern in `datasette-ripgrep` here: https://github.com/simonw/datasette-ripgrep/blob/0.7/datasette_ripgrep/__init__.py#L63-L71", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837350092, "label": "Try implementing SQLite timeouts using .interrupt() instead of using .set_progress_handler()"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803802957", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803802957, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzgwMjk1Nw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T06:38:14Z", "updated_at": "2021-03-22T06:38:14Z", "author_association": "OWNER", "body": "Also worth trying is to change this code:\r\n```python\r\n n = 1000 \r\n if ms < 50: \r\n n = 1 \r\n```\r\nWhat happens with `n = 10` instead?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1269#issuecomment-803785808", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1269", "id": 803785808, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc4NTgwOA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T06:00:53Z", "updated_at": "2021-03-22T06:00:53Z", "author_association": "OWNER", "body": "This may not be necessary if using `.interrupt() for SQLite timeouts in #1270 works.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837348479, "label": "Don't attempt to run count(*) against virtual tables"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803784902", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803784902, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc4NDkwMg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T05:59:06Z", "updated_at": "2021-03-22T05:59:06Z", "author_association": "OWNER", "body": "Even if I implement that workaround in #1269 I'm concerned that this could still allow users to deliberately crash Datasette (if it's running SpatiaLite 5.0) by executing `select count(*) from SpatialIndex`.\r\n\r\nThat `interrupt` timeout mechanism is worth digging into further.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803782705", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803782705, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc4MjcwNQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T05:54:19Z", "updated_at": "2021-03-22T05:54:19Z", "author_association": "OWNER", "body": "Got two new TILs out of this:\r\n\r\n* [Tracing every executed Python statement](https://til.simonwillison.net/python/tracing-every-statement)\r\n* [Running gdb against a Python process in a running Docker container](https://til.simonwillison.net/docker/gdb-python-docker)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803777724", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803777724, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc3NzcyNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T05:42:50Z", "updated_at": "2021-03-22T05:43:23Z", "author_association": "OWNER", "body": "\"tuscany_housenumbers__select___from_sqlite_master_where_sql_like__create_virtual_table__\"\r\n\r\nIf I want to avoid counting virtual tables, I need to detect which tables are virtual tables.\r\n\r\nThe safest way to do this is probably to pull the `sql` for every table and then, in Python, check for values that start with `create virtual table` after converting to lower case, using any number of spaces.\r\n\r\nThis would catch things like ` CREATE virtual TABLE` which might be missed by a SQL `like` query. ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803775121", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803775121, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc3NTEyMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T05:36:26Z", "updated_at": "2021-03-22T05:36:26Z", "author_association": "OWNER", "body": "So one fix could be to avoid running counts for anything that turns out to be a virtual table.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803774926", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803774926, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc3NDkyNg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T05:35:56Z", "updated_at": "2021-03-22T05:35:56Z", "author_association": "OWNER", "body": "That's in this code here: https://github.com/simonw/datasette/blob/c4f1ec7f33fd7d5b93f0f895dafb5351cc3bfc5b/datasette/database.py#L221-L241", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803774518", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803774518, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc3NDUxOA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T05:34:57Z", "updated_at": "2021-03-22T05:34:57Z", "author_association": "OWNER", "body": "... and sure enough, adding this code fixed the problem:\r\n```diff\r\ndiff --git a/datasette/database.py b/datasette/database.py\r\nindex 3579cce..b466b12 100644\r\n--- a/datasette/database.py\r\n+++ b/datasette/database.py\r\n@@ -224,6 +226,9 @@ class Database:\r\n # Try to get counts for each table, $limit timeout for each count\r\n counts = {}\r\n for table in await self.table_names():\r\n+ if table == \"SpatialIndex\":\r\n+ counts[table] = 0\r\n+ continue\r\n try:\r\n table_count = (\r\n await self.execute(\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803773484", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803773484, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc3MzQ4NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T05:32:29Z", "updated_at": "2021-03-22T05:32:29Z", "author_association": "OWNER", "body": "To figure out which SQL query triggers the problem I added this code to write to a log file:\r\n```python\r\n with sqlite_timelimit(conn, time_limit_ms):\r\n try:\r\n cursor = conn.cursor()\r\n with open(\"/tmp/sql.log\", \"ab\", buffering=0) as fp:\r\n fp.write((\"{}: {}\\n\".format(sql, params)).encode(\"utf-8\"))\r\n cursor.execute(sql, params if params is not None else {})\r\n```\r\nI had to use `ab` binary mode because Python doesn't allow `buffering=0` for non-binary file operations.\r\n\r\nWith the log enabled, I used `docker exec -it 589ae68de943 bash` to attach to the running container and `tail -f /tmp/sql.log` to see the logs. Here's where it broke:\r\n\r\n```\r\nselect count(*) from [idx_civici_geom_parent]: None\r\nselect count(*) from [sqlite_stat1]: None\r\nselect count(*) from [sqlite_stat3]: None\r\nselect count(*) from [SpatialIndex]: None\r\n```\r\nSo attempting to run a `count(*)` against the `SpatialIndex` virtual table is the thing that triggers the bug.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803764919", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803764919, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc2NDkxOQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T05:11:11Z", "updated_at": "2021-03-22T05:11:11Z", "author_association": "OWNER", "body": "Maybe I could implement SQLite query timeouts using the `interrupt()` method instead of the progress handler hack I'm currently using?\r\n\r\nhttps://stackoverflow.com/questions/43240496/python-sqlite3-how-to-quickly-and-cleanly-interrupt-long-running-query-with-e has some tips.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803764200", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803764200, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc2NDIwMA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T05:09:13Z", "updated_at": "2021-03-22T05:09:13Z", "author_association": "OWNER", "body": "I tried building a container where the `conn.set_progress_handler(handler, n)` line was commented out... and it fixed the bug.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803762969", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803762969, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc2Mjk2OQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T05:05:51Z", "updated_at": "2021-03-22T05:05:51Z", "author_association": "OWNER", "body": "I had to run `docker kill 16197781a7b5` to kill the broken container - Ctrl+C in the Datasette console window didn't do anything.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803762609", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803762609, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc2MjYwOQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T05:05:00Z", "updated_at": "2021-03-22T05:05:00Z", "author_association": "OWNER", "body": "Using https://til.simonwillison.net/docker/attach-bash-to-running-container - I figured out how to run `gdb`. I had to use `--privileged` here because otherwise `gdb` showed a \"Could not attach to process\" error.\r\n```\r\ndocker exec --privileged -it 16197781a7b5 bash\r\n# apt-get install gdb python3-dbg\r\n# gdb /usr/bin/python3 -p 20\r\n```\r\nThis paused the process. I tried running this:\r\n```\r\n(gdb) py-bt\r\nTraceback (most recent call first):\r\n File \"/usr/lib/python3.8/asyncio/base_events.py\", line 1845, in _run_once\r\n if handle._cancelled:\r\n File \"/usr/lib/python3.8/asyncio/base_events.py\", line 570, in run_forever\r\n self._run_once()\r\n File \"/usr/lib/python3.8/asyncio/base_events.py\", line 603, in run_until_complete\r\n self.run_forever()\r\n File \"/usr/local/lib/python3.8/dist-packages/uvicorn/server.py\", line 49, in run\r\n loop.run_until_complete(self.serve(sockets=sockets))\r\n File \"/usr/local/lib/python3.8/dist-packages/uvicorn/main.py\", line 386, in run\r\n server.run()\r\n File \"/usr/local/lib/python3.8/dist-packages/datasette/cli.py\", line 575, in serve\r\n uvicorn.run(ds.app(), **uvicorn_kwargs)\r\n File \"/usr/local/lib/python3.8/dist-packages/click/core.py\", line 610, in invoke\r\n return callback(*args, **kwargs)\r\n File \"/usr/local/lib/python3.8/dist-packages/click/core.py\", line 1066, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"/usr/local/lib/python3.8/dist-packages/click/core.py\", line 1259, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"/usr/local/lib/python3.8/dist-packages/click/core.py\", line 782, in main\r\n rv = self.invoke(ctx)\r\n File \"/usr/local/lib/python3.8/dist-packages/click/core.py\", line 829, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"/usr/local/bin/datasette\", line 8, in \r\n sys.exit(cli())\r\n \r\n File \"/usr/lib/python3.8/trace.py\", line 450, in runctx\r\n exec(cmd, globals, locals)\r\n File \"/usr/lib/python3.8/trace.py\", line 6632, in main\r\n File \"/usr/lib/python3.8/trace.py\", line 756, in \r\n main()\r\n \r\n File \"/usr/lib/python3.8/runpy.py\", line 343, in _run_code\r\n File \"/usr/lib/python3.8/runpy.py\", line 450, in _run_module_as_main\r\n```\r\nNot sure if that's useful or not.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803759051", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803759051, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc1OTA1MQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T04:55:22Z", "updated_at": "2021-03-22T04:55:22Z", "author_association": "OWNER", "body": "So I think there's a bug in the way the `set_progress_handler()` mechanism works when used in conjunction with SpatiaLite 5.0 on Linux.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803758793", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803758793, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc1ODc5Mw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T04:54:32Z", "updated_at": "2021-03-22T04:54:32Z", "author_association": "OWNER", "body": "Hitting http://localhost:8001/tuscany_housenumbers triggers the bug. It gets stuck in a loop that looks like this:\r\n\r\n\"datasette_\u2014_root_16197781a7b5____\u2014_com_docker_cli_\u25c2_docker_run_-it_-p_8001_8001_-v___Dropbox_Development_datasette__mnt_datasette-spatialite_latest_bash_\u2014_195\u00d748_and_getIncidentsGit_\u2014_-zsh_\u2014_162\u00d760\"\r\n\r\nWhich looks to me like this code: https://github.com/simonw/datasette/blob/8e18c7943181f228ce5ebcea48deb59ce50bee1f/datasette/utils/__init__.py#L139-L158", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803758182", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803758182, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc1ODE4Mg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T04:52:15Z", "updated_at": "2021-03-22T04:52:15Z", "author_association": "OWNER", "body": "Hitting http://localhost:8001/ successfully shows the homepage (after a lot more scrolling).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803757746", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803757746, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc1Nzc0Ng==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T04:50:40Z", "updated_at": "2021-03-22T04:51:52Z", "author_association": "OWNER", "body": "Here's a fun debugging trick:\r\n\r\n docker run -it -p 8001:8001 -v `pwd`:/mnt datasette-spatialite:latest bash\r\n root@16197781a7b5:/# python3 -m trace --trace $(which datasette) \\\r\n -p 8001 -h 0.0.0.0 /mnt/tuscany_housenumbers.sqlite \\\r\n --load-extension=spatialite\r\n\r\nA huge amount of stuff scrolls past as Datasette starts up, since we are tracing every executed line of Python.\r\n\r\nAfter about a minute it's finished starting and gets to this point:\r\n\r\n```\r\nselectors.py(452): if timeout is None:\r\nselectors.py(454): elif timeout <= 0:\r\nselectors.py(459): timeout = math.ceil(timeout * 1e3) * 1e-3\r\nselectors.py(464): max_ev = max(len(self._fd_to_key), 1)\r\nselectors.py(466): ready = []\r\nselectors.py(467): try:\r\nselectors.py(468): fd_event_list = self._selector.poll(timeout, max_ev)\r\n```\r\nNow I can make some HTTP requests against it.\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1268#issuecomment-803756495", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1268", "id": 803756495, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc1NjQ5NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T04:46:04Z", "updated_at": "2021-03-22T04:46:04Z", "author_association": "OWNER", "body": "`gdb` may be able to help debug this: https://www.podoliaka.org/2016/04/10/debugging-cpython-gdb/", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837308703, "label": "Figure out why SpatiaLite 5.0 hangs the database page on Linux"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803755698", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803755698, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc1NTY5OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T04:43:02Z", "updated_at": "2021-03-22T04:43:02Z", "author_association": "OWNER", "body": "I'll spin off a separate ticket to investigate the hang.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1267#issuecomment-803754226", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1267", "id": 803754226, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc1NDIyNg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T04:37:26Z", "updated_at": "2021-03-22T04:37:26Z", "author_association": "OWNER", "body": "Thanks for doing this - I've used alternativeto.net a bunch in the past, it's great to see Datasette listed there.\r\n\r\nThis does raise some interesting philosophical questions: three years into the project I'm still not entirely sure what Datasette competes with! Could be SQLite desktop packages, could be visualization software like Tableau, could even be something like Airtable (given a few more plugins).\r\n\r\nIt will be interesting to see how the alternativeto listing evolves, maybe it will help me answer that question!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 837208901, "label": "Update Datasette alternativeto listening with details"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803753388", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803753388, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc1MzM4OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T04:34:20Z", "updated_at": "2021-03-22T04:35:10Z", "author_association": "OWNER", "body": "Well this is frustrating. I finally found a Dockerfile that worked and installed an Ubuntu pre-compiled SpatiaLite module that would load...\r\n```dockerfile\r\nFROM ubuntu:20.10 as install_spatialite\r\n\r\nRUN apt update && \\\r\n apt install -y libsqlite3-mod-spatialite && \\\r\n apt clean && \\\r\n rm -rf /var/lib/{apt,dpkg,cache,log}/\r\n\r\nFROM ubuntu:20.10\r\n\r\nRUN apt update && \\\r\n apt install -y python3-pip && \\\r\n apt clean && \\\r\n rm -rf /var/lib/{apt,dpkg,cache,log}/\r\n\r\nRUN pip install datasette\r\n\r\n# Copy spatial extensions\r\nCOPY --from=install_spatialite /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/\r\n\r\nENV LD_LIBRARY_PATH=/usr/local/lib\r\n\r\nEXPOSE 8001\r\nCMD [\"datasette\"]\r\n```\r\n(Which produced a 550MB image)\r\n\r\nAnd when I ran Datasette I got that same error where the database listing page hangs!\r\n```\r\ndocker run -p 8001:8001 -v `pwd`:/mnt datasette-spatialite:latest datasette -p 8001 -h 0.0.0.0 /mnt/tuscany_housenumbers.sqlite --load-extension=spatialite\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803751068", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803751068, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc1MTA2OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T04:26:45Z", "updated_at": "2021-03-22T04:26:45Z", "author_association": "OWNER", "body": "Here's why:\r\n```\r\ndatasette % docker run -it -p 8001:8001 -v `pwd`:/mnt datasette-spatialite:latest bash \r\nroot@3430352ff378:/# datasette\r\nbash: /usr/local/bin/datasette: /usr/bin/python3: bad interpreter: No such file or directory\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803750617", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803750617, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc1MDYxNw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T04:25:14Z", "updated_at": "2021-03-22T04:25:14Z", "author_association": "OWNER", "body": "Got this error attempting to run Datasette (with or without SpatiaLite):\r\n```\r\nstandard_init_linux.go:219: exec user process caused: no such file or directory\r\n```\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803750399", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803750399, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc1MDM5OQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T04:24:25Z", "updated_at": "2021-03-22T04:24:25Z", "author_association": "OWNER", "body": "I'll try using `ubuntu:20.10` for everything:\r\n```dockerfile\r\nFROM ubuntu:20.10 as install_spatialite\r\n\r\nRUN apt update && \\\r\n apt install -y libsqlite3-mod-spatialite && \\\r\n apt clean && \\\r\n rm -rf /var/lib/{apt,dpkg,cache,log}/\r\n\r\nFROM ubuntu:20.10 as build\r\n\r\nRUN apt update && \\\r\n apt install -y python3-pip && \\\r\n apt clean && \\\r\n rm -rf /var/lib/{apt,dpkg,cache,log}/\r\n\r\nRUN pip install datasette\r\n\r\n#COPY . /datasette\r\n#RUN pip install /datasette\r\n\r\nFROM ubuntu:20.10\r\n\r\n# Copy python dependencies and spatialite libraries\r\nCOPY --from=build /usr/local/lib/ /usr/local/lib/\r\n# Copy executables\r\nCOPY --from=build /usr/local/bin /usr/local/bin\r\n# Copy spatial extensions\r\nCOPY --from=install_spatialite /usr/lib/x86_64-linux-gnu/mod_spatialite.so /usr/lib/x86_64-linux-gnu/mod_spatialite.so\r\n\r\nENV LD_LIBRARY_PATH=/usr/local/lib\r\n\r\nEXPOSE 8001\r\nCMD [\"datasette\"]\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803749831", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803749831, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc0OTgzMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T04:22:35Z", "updated_at": "2021-03-22T04:22:35Z", "author_association": "OWNER", "body": "I tried copying just the `mod_spatialite.so` file:\r\n```dockerfile\r\nFROM ubuntu:20.10 as install_spatialite\r\n\r\nRUN apt update && \\\r\n apt install -y libsqlite3-mod-spatialite && \\\r\n apt clean && \\\r\n rm -rf /var/lib/{apt,dpkg,cache,log}/\r\n\r\nFROM python:3.9.2-slim as build\r\n\r\nRUN pip install datasette\r\n\r\n#COPY . /datasette\r\n#RUN pip install /datasette\r\n\r\nFROM python:3.9.2-slim\r\n\r\n# Copy python dependencies and spatialite libraries\r\nCOPY --from=build /usr/local/lib/ /usr/local/lib/\r\n# Copy executables\r\nCOPY --from=build /usr/local/bin /usr/local/bin\r\n# Copy spatial extensions\r\nCOPY --from=install_spatialite /usr/lib/x86_64-linux-gnu/mod_spatialite.so /usr/lib/x86_64-linux-gnu/mod_spatialite.so\r\n\r\nENV LD_LIBRARY_PATH=/usr/local/lib\r\n\r\nEXPOSE 8001\r\nCMD [\"datasette\"]\r\n```\r\nBut when I ran Datasette with `--load-extension=spatialite` I got this:\r\n```\r\n File \"/usr/local/lib/python3.9/site-packages/datasette/database.py\", line 151, in in_thread\r\n self.ds._prepare_connection(conn, self.name)\r\n File \"/usr/local/lib/python3.9/site-packages/datasette/app.py\", line 502, in _prepare_connection\r\n conn.execute(f\"SELECT load_extension('{extension}')\")\r\nsqlite3.OperationalError: /usr/lib/x86_64-linux-gnu/mod_spatialite.so.so: cannot open shared object file: No such file or directory\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803748469", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803748469, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc0ODQ2OQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T04:17:51Z", "updated_at": "2021-03-22T04:17:51Z", "author_association": "OWNER", "body": "... except my clever image using SpatiaLite installed for Ubuntu doesn't actually work:\r\n\r\n```\r\ndatasette % docker run -p 8001:8001 -v `pwd`:/mnt datasette-spatialite:latest datasette -p 8001 -h 0.0.0.0 /mnt/fixtures.db\r\n File \"/usr/local/lib/python3.9/sqlite3/dbapi2.py\", line 27, in \r\n from _sqlite3 import *\r\nImportError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by /usr/lib/x86_64-linux-gnu/libsqlite3.so.0)\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803748158", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803748158, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc0ODE1OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T04:16:57Z", "updated_at": "2021-03-22T04:16:57Z", "author_association": "OWNER", "body": "Which is great, because the image on Docker Hub right now is 383MB.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803747701", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803747701, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzc0NzcwMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T04:15:40Z", "updated_at": "2021-03-22T04:15:40Z", "author_association": "OWNER", "body": "Here's a trick: install SpatiaLite in `ubuntu:20.10` and then copy it into the final `python:3.9.2-slim` image.\r\n\r\n```dockerfile\r\nFROM ubuntu:20.10 as install_spatialite\r\n\r\nRUN apt update && \\\r\n apt install -y libsqlite3-mod-spatialite && \\\r\n apt clean && \\\r\n rm -rf /var/lib/{apt,dpkg,cache,log}/\r\n\r\nFROM python:3.9.2-slim as build\r\n\r\nRUN pip install datasette\r\n\r\n#COPY . /datasette\r\n#RUN pip install /datasette\r\n\r\nFROM python:3.9.2-slim\r\n\r\n# Copy python dependencies and spatialite libraries\r\nCOPY --from=build /usr/local/lib/ /usr/local/lib/\r\n# Copy executables\r\nCOPY --from=build /usr/local/bin /usr/local/bin\r\n# Copy spatial extensions\r\nCOPY --from=install_spatialite /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu\r\n\r\nENV LD_LIBRARY_PATH=/usr/local/lib\r\n\r\nEXPOSE 8001\r\nCMD [\"datasette\"]\r\n```\r\nThat produced a 265MB image.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803700940", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803700940, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzcwMDk0MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T01:14:24Z", "updated_at": "2021-03-22T01:14:24Z", "author_association": "OWNER", "body": "I tried that with just `python3-pip` (removing `libsqlite3-mod-spatialite`) and got 435MB.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803700626", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803700626, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzcwMDYyNg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T01:13:04Z", "updated_at": "2021-03-22T01:13:04Z", "author_association": "OWNER", "body": "Building a Dockerfile containing just `FROM ubuntu:20.10` gave me `79.5MB`.\r\n\r\nBuilding this one:\r\n```dockerfile\r\nFROM ubuntu:20.10\r\n\r\n# Setup build dependencies\r\nRUN apt update && \\\r\n apt install -y python3-pip libsqlite3-mod-spatialite && \\\r\n apt clean && \\\r\n rm -rf /var/lib/{apt,dpkg,cache,log}/\r\n```\r\nResulted in a 515MB image.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803698983", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803698983, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzY5ODk4Mw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T01:05:36Z", "updated_at": "2021-03-22T01:06:23Z", "author_association": "OWNER", "body": "It's pretty big though. I tried this version which avoids copying junk from my laptop in:\r\n\r\n```dockerfile\r\nFROM ubuntu:20.10\r\n\r\n# Setup build dependencies\r\nRUN apt update && apt install -y python3-pip libsqlite3-mod-spatialite && apt clean\r\n\r\nRUN pip install datasette\r\n\r\nEXPOSE 8001\r\nCMD [\"datasette\"]\r\n```\r\nAnd got this:\r\n```\r\ndatasette % docker images datasette-spatialite \r\nREPOSITORY TAG IMAGE ID CREATED SIZE\r\ndatasette-spatialite latest 0796950653c2 2 seconds ago 528MB\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803698168", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803698168, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzY5ODE2OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T01:02:02Z", "updated_at": "2021-03-22T01:02:30Z", "author_association": "OWNER", "body": "This is the shortest Dockerfile that appeared to give me a working SpatiaLite module:\r\n```dockerfile\r\nFROM ubuntu:20.10\r\n\r\n# Setup build dependencies\r\nRUN apt update && apt install -y python3-pip libsqlite3-mod-spatialite && apt clean\r\n\r\n# Add local code to the image instead of fetching from pypi.\r\nCOPY . /datasette\r\n\r\nRUN pip install /datasette\r\n\r\nRUN rm -rf /datasette\r\n\r\nEXPOSE 8001\r\nCMD [\"datasette\"]\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803697546", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803697546, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzY5NzU0Ng==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T00:59:47Z", "updated_at": "2021-03-22T00:59:47Z", "author_association": "OWNER", "body": "To debug I'm running:\r\n\r\n docker run -it -p 8001:8001 -v `pwd`:/mnt datasette-spatialite:latest bash\r\n\r\nThis gets me a shell I can use.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803697211", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803697211, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzY5NzIxMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T00:58:01Z", "updated_at": "2021-03-22T00:58:01Z", "author_association": "OWNER", "body": "I'm messing around with the `Dockerfile` and after each change I'm running:\r\n\r\n docker build . -t datasette-spatialite\r\n\r\nAnd then:\r\n\r\n docker run -p 8001:8001 -v `pwd`:/mnt datasette-spatialite:latest datasette -p 8001 -h 0.0.0.0 /mnt/fixtures.db\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803694661", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803694661, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzY5NDY2MQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T00:46:49Z", "updated_at": "2021-03-22T00:46:49Z", "author_association": "OWNER", "body": "Actually for the loadable module I think I need https://packages.ubuntu.com/groovy/libsqlite3-mod-spatialite", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803694436", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803694436, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzY5NDQzNg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T00:46:00Z", "updated_at": "2021-03-22T00:46:00Z", "author_association": "OWNER", "body": "So I'm going to try `20.10` and see where that gets me.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803694359", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803694359, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzY5NDM1OQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T00:45:47Z", "updated_at": "2021-03-22T00:45:47Z", "author_association": "OWNER", "body": "https://pythonspeed.com/articles/base-image-python-docker-images/ suggests using `python:3.9-slim-buster` or `ubuntu:20.04` - but 20.04 is focal which still has SpatiaLite `4.3.0a-6build1` - It's `20.10` that has 5.0: https://packages.ubuntu.com/groovy/libspatialite-dev", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803693181", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803693181, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzY5MzE4MQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T00:41:02Z", "updated_at": "2021-03-22T00:41:02Z", "author_association": "OWNER", "body": "Debian sid has it too: https://packages.debian.org/sid/libspatialite-dev", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803692673", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803692673, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzY5MjY3Mw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T00:38:42Z", "updated_at": "2021-03-22T00:38:42Z", "author_association": "OWNER", "body": "Ubuntu Groovy has a package for SpatiaLite 5 - I could try using that instead: https://packages.ubuntu.com/groovy/libspatialite-dev", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-803691236", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 803691236, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzY5MTIzNg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-22T00:32:03Z", "updated_at": "2021-03-22T00:32:03Z", "author_association": "OWNER", "body": "Here's something odd: when I run `datasette tuscany_housenumbers.sqlite --load-extension=spatialite` on macOS against SpatiaLite installed using Homebrew (which reports `\"spatialite\": \"5.0.0\"` on the `/-/versions` page) I don't get any weird errors at all, everything works fine.\r\n\r\nBut when I tried compiling SpatiaLite inside the Docker container I had hanging errors on some pages.\r\n\r\nThis is using https://www.gaia-gis.it/gaia-sins/knn/tuscany_housenumbers.7z from the SpatiaLite KNN tutorial at https://www.gaia-gis.it/fossil/libspatialite/wiki?name=KNN", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1259#issuecomment-803674728", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1259", "id": 803674728, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzY3NDcyOA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-21T22:55:31Z", "updated_at": "2021-03-21T22:55:31Z", "author_association": "OWNER", "body": "CTEs were added in 2014-02-03 SQLite 3.8.3 - so I think it's OK to depend on them for Datasette.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 830567275, "label": "Research using CTEs for faster facet counts"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/647#issuecomment-803673225", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/647", "id": 803673225, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzY3MzIyNQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-21T22:44:19Z", "updated_at": "2021-03-21T22:44:19Z", "author_association": "OWNER", "body": "Now that I'm looking at refactoring how views work in #878 it's clear that the gnarliest, most convoluted code I need to deal with relates to this old feature.\r\n\r\nI'm going to remove it entirely. Any performance enhancement or provides can be achieved just as well by using regular URLs and a caching proxy.\r\n\r\nI may provide a 404 handling plugin that attempts to rewrite old URLs that used this mechanism, but I won't do any more than that.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 531755959, "label": "Move hashed URL mode out to a plugin"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/249#issuecomment-803501756", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/249", "id": 803501756, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzUwMTc1Ng==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-21T02:33:45Z", "updated_at": "2021-03-21T02:33:45Z", "author_association": "OWNER", "body": "Did you run `enable-fts` before you inserted the data?\r\n\r\nIf so you'll need to run `populate-fts` after the insert to populate the FTS index.\r\n\r\nA better solution may be to add `--create-triggers` to the `enable-fts` command to add triggers that will automatically keep the index updated as you insert new records.", "reactions": "{\"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 836963850, "label": "Full text search possibly broken?"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/878#issuecomment-803473015", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/878", "id": 803473015, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzQ3MzAxNQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-20T22:33:05Z", "updated_at": "2021-03-20T22:33:05Z", "author_association": "OWNER", "body": "Things this mechanism needs to be able to support:\r\n\r\n- Returning a default JSON representation\r\n- Defining \"extra\" JSON representations blocks, which can be requested using `?_extra=`\r\n- Returning rendered HTML, based on the default JSON + one or more extras + a template\r\n- Using Datasette output renderers to return e.g. CSV data\r\n- Potentially also supporting streaming output renderers for streaming CSV/TSV/JSON-nl etc", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 648435885, "label": "New pattern for views that return either JSON or HTML, available for plugins"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/878#issuecomment-803472595", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/878", "id": 803472595, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzQ3MjU5NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-20T22:28:12Z", "updated_at": "2021-03-20T22:28:12Z", "author_association": "OWNER", "body": "Another idea I had: a view is a class that takes the `datasette` instance in its constructor, and defines a `__call__` method that accepts a request and returns a response. Except `await __call__` looks like it might be a bit messy, discussion in https://github.com/encode/starlette/issues/886", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 648435885, "label": "New pattern for views that return either JSON or HTML, available for plugins"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/878#issuecomment-803472278", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/878", "id": 803472278, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzQ3MjI3OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-20T22:25:04Z", "updated_at": "2021-03-20T22:25:04Z", "author_association": "OWNER", "body": "I came up with a slightly wild idea for this that would involve pytest-style dependency injection.\r\n\r\nPrototype here: https://gist.github.com/simonw/496b24fdad44f6f8b7237fe394a0ced7\r\n\r\nCopying from my private notes:\r\n\r\n> Use the lazy evaluated DI mechanism to break up table view into different pieces eg for faceting\r\n> \r\n> Use that to solve JSON vs HTML views\r\n> \r\n> Oh here's an idea: what if the various components of the table view were each defined as async functions.... and then executed using asyncio.gather in order to run the SQL queries in parallel? Then try benchmarking with different numbers of threads?\r\n> \r\n> The async_call_with_arguments function could do this automatically for any awaitable dependencies\r\n> \r\n> This would give me massively parallel dependency injection\r\n> \r\n> (I could build an entire framework around this and call it c64)\r\n> \r\n> Idea: arguments called eg \"count\" are executed and the result passed to the function. If called count_fn then a reference to the not-yet-called function is passed instead \r\n> \r\n> I'm not going to completely combine the views mechanism and the render hooks. Instead, the core view will define a bunch of functions used to compose the page and the render hook will have conditional access to those functions - which will otherwise be asyncio.gather executed directly by the HTML page version\r\n> \r\n> Using asyncio.gather to execute facets and suggest facets in parallel would be VERY interesting \r\n> \r\n> suggest facets should be VERY cachable - doesn't matter if it's wrong unlike actual facets themselves\r\n> \r\n> What if all Datasette views were defined in terms of dependency injection - and those dependency functions could themselves depend on others just like pytest fixtures. Everything would become composable and async stuff could execute in parallel\r\n> \r\n> FURTHER IDEA: use this for the ?_extra= mechanism as well.\r\n> \r\n> Any view in Datasette can be defined as a collection of named keys. Each of those keys maps to a function or an async function that accepts as input other named keys, using DI to handle them.\r\n> \r\n> The HTML view is a defined function. So are the other outputs.\r\n> \r\n> Default original inputs include \u201crequest\u201d and \u201cdatasette\u201d.\r\n> \r\n> So\u2026 maybe a view function is a class methods that use DI. One of those methods as an .html() method used for the default page.\r\n> \r\n> Output formats are a bit more complicated because they are supposed to be defined separately in plugins. They are unified across query, row and table though.\r\n> \r\n> I\u2019m going to try breaking up the TableView to see what happens.", "reactions": "{\"total_count\": 1, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 1}", "issue": {"value": 648435885, "label": "New pattern for views that return either JSON or HTML, available for plugins"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/878#issuecomment-803471917", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/878", "id": 803471917, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzQ3MTkxNw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-20T22:21:33Z", "updated_at": "2021-03-20T22:21:33Z", "author_association": "OWNER", "body": "This has been blocking things for too long.\r\n\r\nIf this becomes a documented pattern, things like adding a JSON output to https://github.com/dogsheep/dogsheep-beta becomes easier too.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 648435885, "label": "New pattern for views that return either JSON or HTML, available for plugins"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1258#issuecomment-803471702", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1258", "id": 803471702, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzQ3MTcwMg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-20T22:19:39Z", "updated_at": "2021-03-20T22:19:39Z", "author_association": "OWNER", "body": "This is a good idea. I avoided this initially because it should be possible to run a canned query with a parameter set to the empty string, but that view could definitely be smart enough to differentiate between `?sql=...¶m=` and `?sql=` with no `param` specified at all.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 828858421, "label": "Allow canned query params to specify default values"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/782#issuecomment-803469623", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/782", "id": 803469623, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzQ2OTYyMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-20T22:01:23Z", "updated_at": "2021-03-20T22:01:23Z", "author_association": "OWNER", "body": "I'm going to keep `?_shape=array` working on the assumption that many existing uses of the Datasette API are already using that option, so it would be nice not to break them.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 627794879, "label": "Redesign default .json format"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1261#issuecomment-803468314", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1261", "id": 803468314, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzQ2ODMxNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-20T21:48:48Z", "updated_at": "2021-03-20T21:48:48Z", "author_association": "OWNER", "body": "That's fixed in this release of `datasette-publish-vercel`: https://github.com/simonw/datasette-publish-vercel/releases/tag/0.9.2", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 832092321, "label": "Some links aren't properly URL encoded."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1261#issuecomment-803466868", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1261", "id": 803466868, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzQ2Njg2OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-20T21:36:06Z", "updated_at": "2021-03-20T21:36:06Z", "author_association": "OWNER", "body": "This isn't a Datasette bug - it's a Vercel bug: https://github.com/simonw/datasette-publish-vercel/issues/28\r\n\r\nI'm looking at a fix for that now, so watch that issue for updates.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 832092321, "label": "Some links aren't properly URL encoded."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1266#issuecomment-803466730", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1266", "id": 803466730, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzQ2NjczMA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-20T21:35:00Z", "updated_at": "2021-03-20T21:35:00Z", "author_association": "OWNER", "body": "https://docs.datasette.io/en/latest/internals.html#returning-a-response-with-asgi-send-send", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 836273891, "label": "Documentation for Response.asgi_send(send) method"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1265#issuecomment-803130332", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1265", "id": 803130332, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMzEzMDMzMg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-19T21:03:09Z", "updated_at": "2021-03-19T21:03:09Z", "author_association": "OWNER", "body": "This is now available in `datasette-auth-passwords` 0.4! https://github.com/simonw/datasette-auth-passwords/releases/tag/0.4", "reactions": "{\"total_count\": 1, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 1, \"eyes\": 0}", "issue": {"value": 836123030, "label": "Support for HTTP Basic Authentication"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1262#issuecomment-802099264", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1262", "id": 802099264, "node_id": "MDEyOklzc3VlQ29tbWVudDgwMjA5OTI2NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-18T16:43:09Z", "updated_at": "2021-03-18T16:43:09Z", "author_association": "OWNER", "body": "I often find myself wanting this too, when I'm exploring a new dataset.\r\n\r\ni agree with Bob that this is a good candidate for a plugin. The plugin system isn't quite setup for this yet though - there isn't an obvious mechanism for adding extra sort orders or other interface elements that manipulate the query used by the table view in some way.\r\n\r\nI'm going to promote this issue to status of a plugin hook feature request - I have a hunch that a plugin hook that enables `order by random()` could enable a lot of other useful plugin features too.", "reactions": "{\"total_count\": 2, \"+1\": 2, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 834602299, "label": "Plugin hook that could support 'order by random()' for table view"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/246#issuecomment-799479175", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/246", "id": 799479175, "node_id": "MDEyOklzc3VlQ29tbWVudDc5OTQ3OTE3NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-15T14:47:31Z", "updated_at": "2021-03-15T14:47:31Z", "author_association": "OWNER", "body": "This is a smart feature. I have something that does this in Datasette, extracting it out to `sqlite-utils` makes a lot of sense.\r\n\r\nhttps://github.com/simonw/datasette/blob/8e18c7943181f228ce5ebcea48deb59ce50bee1f/datasette/utils/__init__.py#L818-L829", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 831751367, "label": "Escaping FTS search strings"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/236#issuecomment-799066252", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/236", "id": 799066252, "node_id": "MDEyOklzc3VlQ29tbWVudDc5OTA2NjI1Mg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-15T03:34:52Z", "updated_at": "2021-03-15T03:34:52Z", "author_association": "OWNER", "body": "Yeah the Lambda Docker stuff is pretty odd - you still don't get to speak HTTP, you have to speak their custom event protocol instead.\r\n\r\nhttps://github.com/glassechidna/serverlessish looks interesting here - it adds a proxy inside the container which allows your existing HTTP Docker image to run within Docker-on-Lambda. I've not tried it out yet though.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 317001500, "label": "datasette publish lambda plugin"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1259#issuecomment-797827038", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1259", "id": 797827038, "node_id": "MDEyOklzc3VlQ29tbWVudDc5NzgyNzAzOA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-13T00:15:40Z", "updated_at": "2021-03-13T00:15:40Z", "author_association": "OWNER", "body": "If all of the facets were being calculated in a single query, I'd be willing to bump the facet time limit up to something a lot higher, maybe even a full second. There's a chance that could work amazingly well with a materialized CTE.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 830567275, "label": "Research using CTEs for faster facet counts"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1259#issuecomment-797804869", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1259", "id": 797804869, "node_id": "MDEyOklzc3VlQ29tbWVudDc5NzgwNDg2OQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-12T23:05:05Z", "updated_at": "2021-03-12T23:05:05Z", "author_association": "OWNER", "body": "I wonder if I could optimize facet suggestion in the same way?\r\n\r\nOne challenge: the query time limit will apply to the full CTE query, not to the individual columns.\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 830567275, "label": "Research using CTEs for faster facet counts"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1259#issuecomment-797801075", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1259", "id": 797801075, "node_id": "MDEyOklzc3VlQ29tbWVudDc5NzgwMTA3NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-12T22:53:56Z", "updated_at": "2021-03-12T22:55:16Z", "author_association": "OWNER", "body": "OK, a better comparison:\r\n\r\nhttps://global-power-plants.datasettes.com/global-power-plants?sql=WITH+data+as+%28%0D%0A++select%0D%0A++++*%0D%0A++from%0D%0A++++%5Bglobal-power-plants%5D%0D%0A%29%2C%0D%0Acountry_long+as+%28select+%0D%0A++%27country_long%27+as+col%2C+country_long+as+value%2C+count%28*%29+as+c+from+data+group+by+country_long%0D%0A++order+by+c+desc+limit+31%0D%0A%29%2C%0D%0Aprimary_fuel+as+%28%0D%0Aselect%0D%0A++%27primary_fuel%27+as+col%2C+primary_fuel+as+value%2C+count%28*%29+as+c+from+data+group+by+primary_fuel%0D%0A++order+by+c+desc+limit+31%0D%0A%29%2C%0D%0Aowner+as+%28%0D%0Aselect%0D%0A++%27owner%27+as+col%2C+owner+as+value%2C+count%28*%29+as+c+from+data+group+by+owner%0D%0A++order+by+c+desc+limit+31%0D%0A%29%0D%0Aselect+*+from+primary_fuel+union+select+*+from+country_long%0D%0Aunion+select+*+from+owner+order+by+col%2C+c+desc calculates facets against three columns. It takes **78.5ms** (and 34.5ms when I refreshed it, presumably after warming some SQLite caches of some sort).\r\n\r\nhttps://global-power-plants.datasettes.com/global-power-plants/global-power-plants?_facet=country_long&_facet=primary_fuel&_trace=1&_size=0 shows those facets with size=0 on the SQL query - and shows a SQL trace at the bottom of the page.\r\n\r\nThe country_long facet query takes 45.36ms, owner takes 38.45ms, primary_fuel takes 49.04ms - so a total of 132.85ms\r\n\r\nThat's against https://global-power-plants.datasettes.com/-/versions says SQLite 3.27.3 - so even on a SQLite version that doesn't materialize the CTEs there's a significant performance boost to doing all three facets in a single CTE query.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 830567275, "label": "Research using CTEs for faster facet counts"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1259#issuecomment-797790017", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1259", "id": 797790017, "node_id": "MDEyOklzc3VlQ29tbWVudDc5Nzc5MDAxNw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-12T22:22:12Z", "updated_at": "2021-03-12T22:22:12Z", "author_association": "OWNER", "body": "https://sqlite.org/lang_with.html\r\n\r\n> Prior to SQLite 3.35.0, all CTEs where treated as if the NOT MATERIALIZED phrase was present\r\n\r\nIt looks like this optimization is completely unavailable on SQLite prior to 3.35.0 (released 12th March 2021). But I could still rewrite the faceting to work in this way, using the exact same SQL - it would just be significantly faster on 3.35.0+ (assuming it's actually faster in practice - would need to benchmark).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 830567275, "label": "Research using CTEs for faster facet counts"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1193#issuecomment-797159434", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1193", "id": 797159434, "node_id": "MDEyOklzc3VlQ29tbWVudDc5NzE1OTQzNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-12T01:01:54Z", "updated_at": "2021-03-12T01:01:54Z", "author_association": "OWNER", "body": "DuckDB has a read-only mechanism: https://duckdb.org/docs/api/python\r\n\r\n```python\r\nimport duckdb\r\ncon = duckdb.connect(database=\"/tmp/blah.db\", read_only=True)\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 787173276, "label": "Research plugin hook for alternative database backends"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1250#issuecomment-797159221", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1250", "id": 797159221, "node_id": "MDEyOklzc3VlQ29tbWVudDc5NzE1OTIyMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-12T01:01:17Z", "updated_at": "2021-03-12T01:01:17Z", "author_association": "OWNER", "body": "This is a duplicate of #1193.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824067604, "label": "Research: Plugin hook for alternative database connections"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/670#issuecomment-797158641", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/670", "id": 797158641, "node_id": "MDEyOklzc3VlQ29tbWVudDc5NzE1ODY0MQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-12T00:59:49Z", "updated_at": "2021-03-12T00:59:49Z", "author_association": "OWNER", "body": "> Challenge: what's the equivalent for PostgreSQL of opening a database in read only mode? Will I have to talk users through creating read only credentials?\r\n\r\nIt looks like the answer to this is yes - I'll need users to setup read-only credentials. Here's a TIL about that: https://til.simonwillison.net/postgresql/read-only-postgresql-user", "reactions": "{\"total_count\": 1, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 1, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 564833696, "label": "Prototoype for Datasette on PostgreSQL"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1211#issuecomment-796854370", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1211", "id": 796854370, "node_id": "MDEyOklzc3VlQ29tbWVudDc5Njg1NDM3MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-11T16:15:29Z", "updated_at": "2021-03-11T16:15:29Z", "author_association": "OWNER", "body": "Thanks very much for this - it's really comprehensive. I need to bake some of these patterns into my coding habits better!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 797649915, "label": "Use context manager instead of plain open"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/838#issuecomment-795918377", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/838", "id": 795918377, "node_id": "MDEyOklzc3VlQ29tbWVudDc5NTkxODM3Nw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-10T19:01:48Z", "updated_at": "2021-03-10T19:01:48Z", "author_association": "OWNER", "body": "The biggest challenge here I think is to replicate the exact situation here this happens in a Python unit test. The fix should be easy once we have a test in place.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 637395097, "label": "Incorrect URLs when served behind a proxy with base_url set"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/838#issuecomment-795895436", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/838", "id": 795895436, "node_id": "MDEyOklzc3VlQ29tbWVudDc5NTg5NTQzNg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-10T18:44:46Z", "updated_at": "2021-03-10T18:44:57Z", "author_association": "OWNER", "body": "Let's reopen this.", "reactions": "{\"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 637395097, "label": "Incorrect URLs when served behind a proxy with base_url set"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1254#issuecomment-795870524", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1254", "id": 795870524, "node_id": "MDEyOklzc3VlQ29tbWVudDc5NTg3MDUyNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-10T18:27:45Z", "updated_at": "2021-03-10T18:27:45Z", "author_association": "OWNER", "body": "What other breaks did you spot?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 826613352, "label": "Update Docker Spatialite version to 5.0.1 + add support for Spatialite topology functions"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1256#issuecomment-795869144", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1256", "id": 795869144, "node_id": "MDEyOklzc3VlQ29tbWVudDc5NTg2OTE0NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-10T18:26:46Z", "updated_at": "2021-03-10T18:26:46Z", "author_association": "OWNER", "body": "Thanks!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 827341657, "label": "Minor type in IP adress"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1254#issuecomment-794439632", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1254", "id": 794439632, "node_id": "MDEyOklzc3VlQ29tbWVudDc5NDQzOTYzMg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-09T20:53:02Z", "updated_at": "2021-03-09T20:53:02Z", "author_association": "OWNER", "body": "Thanks for catching that documentation update!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 826613352, "label": "Update Docker Spatialite version to 5.0.1 + add support for Spatialite topology functions"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1254#issuecomment-794437715", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1254", "id": 794437715, "node_id": "MDEyOklzc3VlQ29tbWVudDc5NDQzNzcxNQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-09T20:51:19Z", "updated_at": "2021-03-09T20:51:19Z", "author_association": "OWNER", "body": "Did you see my note on https://github.com/simonw/datasette/issues/1249#issuecomment-792384382 about a weird issue I was having with the `/dbname` page hanging the server? Have you seen anything like that in your work here?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 826613352, "label": "Update Docker Spatialite version to 5.0.1 + add support for Spatialite topology functions"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1250#issuecomment-792386484", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1250", "id": 792386484, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MjM4NjQ4NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-08T00:29:06Z", "updated_at": "2021-03-08T00:29:06Z", "author_association": "OWNER", "body": "DuckDB has a read-only mechanism: https://duckdb.org/docs/api/python\r\n\r\n```python\r\nimport duckdb\r\ncon = duckdb.connect(database=\"/tmp/blah.db\", read_only=True)\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824067604, "label": "Research: Plugin hook for alternative database connections"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1248#issuecomment-792385274", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1248", "id": 792385274, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MjM4NTI3NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-08T00:25:10Z", "updated_at": "2021-03-08T00:25:10Z", "author_association": "OWNER", "body": "It's not possible yet, unfortunately. This came up on the forums recently: https://github.com/simonw/datasette/discussions/968\r\n\r\nI'm leaning further towards making the database connection layer itself work via a plugin hook, which would open up the possibility of supporting DuckDB and other databases as well. I've not committed to doing this yet though.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 823035080, "label": "duckdb database (very low performance in SQLite)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-792384854", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 792384854, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MjM4NDg1NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-08T00:23:38Z", "updated_at": "2021-03-08T00:23:38Z", "author_association": "OWNER", "body": "One reason to prioritize this issue: Homebrew upgraded to SpatiaLite 5.0 recently https://formulae.brew.sh/formula/spatialite-tools and as a result SpatiaLite database created on my laptop don't appear to be compatible with Datasette when published using `datasette publish`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-792384382", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 792384382, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MjM4NDM4Mg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-08T00:22:02Z", "updated_at": "2021-03-08T00:22:02Z", "author_association": "OWNER", "body": "I tried this patch against `Dockerfile`:\r\n```diff\r\ndiff --git a/Dockerfile b/Dockerfile\r\nindex f4b1414..dd659e1 100644\r\n--- a/Dockerfile\r\n+++ b/Dockerfile\r\n@@ -1,25 +1,26 @@\r\n-FROM python:3.7.10-slim-stretch as build\r\n+FROM python:3.9.2-slim-buster as build\r\n \r\n # Setup build dependencies\r\n RUN apt update \\\r\n-&& apt install -y python3-dev build-essential wget libxml2-dev libproj-dev libgeos-dev libsqlite3-dev zlib1g-dev pkg-config git \\\r\n- && apt clean\r\n+ && apt install -y python3-dev build-essential wget libxml2-dev libproj-dev \\\r\n+ libminizip-dev libgeos-dev libsqlite3-dev zlib1g-dev pkg-config git \\\r\n+ && apt clean\r\n \r\n-\r\n-RUN wget \"https://www.sqlite.org/2020/sqlite-autoconf-3310100.tar.gz\" && tar xzf sqlite-autoconf-3310100.tar.gz \\\r\n- && cd sqlite-autoconf-3310100 && ./configure --disable-static --enable-fts5 --enable-json1 CFLAGS=\"-g -O2 -DSQLITE_ENABLE_FTS3=1 -DSQLITE_ENABLE_FTS3_PARENTHESIS -DSQLITE_ENABLE_FTS4=1 -DSQLITE_ENABLE_RTREE=1 -DSQLITE_ENABLE_JSON1\" \\\r\n+RUN wget \"https://www.sqlite.org/2021/sqlite-autoconf-3340100.tar.gz\" && tar xzf sqlite-autoconf-3340100.tar.gz \\\r\n+ && cd sqlite-autoconf-3340100 && ./configure --disable-static --enable-fts5 --enable-json1 \\\r\n+ CFLAGS=\"-g -O2 -DSQLITE_ENABLE_FTS3=1 -DSQLITE_ENABLE_FTS3_PARENTHESIS -DSQLITE_ENABLE_FTS4=1 -DSQLITE_ENABLE_RTREE=1 -DSQLITE_ENABLE_JSON1\" \\\r\n && make && make install\r\n \r\n-RUN wget \"http://www.gaia-gis.it/gaia-sins/freexl-sources/freexl-1.0.5.tar.gz\" && tar zxf freexl-1.0.5.tar.gz \\\r\n- && cd freexl-1.0.5 && ./configure && make && make install\r\n+RUN wget \"http://www.gaia-gis.it/gaia-sins/freexl-1.0.6.tar.gz\" && tar zxf freexl-1.0.6.tar.gz \\\r\n+ && cd freexl-1.0.6 && ./configure && make && make install\r\n \r\n-RUN wget \"http://www.gaia-gis.it/gaia-sins/libspatialite-sources/libspatialite-4.4.0-RC0.tar.gz\" && tar zxf libspatialite-4.4.0-RC0.tar.gz \\\r\n- && cd libspatialite-4.4.0-RC0 && ./configure && make && make install\r\n+RUN wget \"http://www.gaia-gis.it/gaia-sins/libspatialite-5.0.1.tar.gz\" && tar zxf libspatialite-5.0.1.tar.gz \\\r\n+ && cd libspatialite-5.0.1 && ./configure --disable-rttopo && make && make install\r\n \r\n RUN wget \"http://www.gaia-gis.it/gaia-sins/readosm-sources/readosm-1.1.0.tar.gz\" && tar zxf readosm-1.1.0.tar.gz && cd readosm-1.1.0 && ./configure && make && make install\r\n \r\n-RUN wget \"http://www.gaia-gis.it/gaia-sins/spatialite-tools-sources/spatialite-tools-4.4.0-RC0.tar.gz\" && tar zxf spatialite-tools-4.4.0-RC0.tar.gz \\\r\n- && cd spatialite-tools-4.4.0-RC0 && ./configure && make && make install\r\n+RUN wget \"http://www.gaia-gis.it/gaia-sins/spatialite-tools-5.0.0.tar.gz\" && tar zxf spatialite-tools-5.0.0.tar.gz \\\r\n+ && cd spatialite-tools-5.0.0 && ./configure --disable-rttopo && make && make install\r\n \r\n \r\n # Add local code to the image instead of fetching from pypi.\r\n@@ -27,7 +28,7 @@ COPY . /datasette\r\n \r\n RUN pip install /datasette\r\n \r\n-FROM python:3.7.10-slim-stretch\r\n+FROM python:3.9.2-slim-buster\r\n \r\n # Copy python dependencies and spatialite libraries\r\n COPY --from=build /usr/local/lib/ /usr/local/lib/\r\n```\r\n\r\nI had to use `--disable-rttopo` from the tip in https://github.com/OSGeo/gdal/pull/3443 and also needed to install `libminizip-dev`.\r\n\r\nThis works, sort of... I'm getting a weird issue where the `/dbname` page is hanging some of the time instead of loading correctly. Other than that it seems to work, but a hanging page is bad!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1249#issuecomment-792383956", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1249", "id": 792383956, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MjM4Mzk1Ng==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-08T00:20:09Z", "updated_at": "2021-03-08T00:20:09Z", "author_association": "OWNER", "body": "Worth noting that the Docker image used by `datasette publish cloudrun` doesn't actually use that Datasette docker image - it does this:\r\n\r\nhttps://github.com/simonw/datasette/blob/d0fd833b8cdd97e1b91d0f97a69b494895d82bee/datasette/utils/__init__.py#L349-L353\r\n\r\nWhere the apt extras for SpatiaLite are: https://github.com/simonw/datasette/blob/d0fd833b8cdd97e1b91d0f97a69b494895d82bee/datasette/utils/__init__.py#L344-L345\r\n\r\n`libsqlite3-mod-spatialite` against that official `python:3.8` image doesn't appear to install SpatiaLite 5.0.\r\n\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 824064069, "label": "Updated Dockerfile with SpatiaLite version 5.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1223#issuecomment-792233255", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1223", "id": 792233255, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MjIzMzI1NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-07T07:41:01Z", "updated_at": "2021-03-07T07:41:01Z", "author_association": "OWNER", "body": "This is fantastic, thanks so much for tracking this down.", "reactions": "{\"total_count\": 1, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 1, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 806918878, "label": "Add compile option to Dockerfile to fix failing test (fixes #696)"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790695126", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5", "id": 790695126, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MDY5NTEyNg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-04T15:20:42Z", "updated_at": "2021-03-04T15:20:42Z", "author_association": "MEMBER", "body": "I'm not sure why but my most recent import, when displayed in Datasette, looks like this:\r\n\r\n\"mbox__mbox_emails__753_446_rows\"\r\n\r\nSorting by `id` in the opposite order gives me the data I would expect - so it looks like a bunch of null/blank messages are being imported at some point and showing up first due to ID ordering.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 813880401, "label": "WIP: Add Gmail takeout mbox import"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790693674", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5", "id": 790693674, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MDY5MzY3NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-04T15:18:36Z", "updated_at": "2021-03-04T15:18:36Z", "author_association": "MEMBER", "body": "I imported my 10GB mbox with 750,000 emails in it, ran this tool (with a hacked fix for the blob column problem) - and now a search that returns 92 results takes 25.37ms! This is fantastic.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 813880401, "label": "WIP: Add Gmail takeout mbox import"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790669767", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5", "id": 790669767, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MDY2OTc2Nw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-04T14:46:06Z", "updated_at": "2021-03-04T14:46:06Z", "author_association": "MEMBER", "body": "Solution could be to pre-process that string by splitting on `(` and dropping everything afterwards, assuming that the `(...)` bit isn't necessary for correctly parsing the date.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 813880401, "label": "WIP: Add Gmail takeout mbox import"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790668263", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5", "id": 790668263, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MDY2ODI2Mw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-04T14:43:58Z", "updated_at": "2021-03-04T14:43:58Z", "author_association": "MEMBER", "body": "I added this code to output a message ID on errors:\r\n```diff\r\n print(\"Errors: {}\".format(num_errors))\r\n print(traceback.format_exc())\r\n+ print(\"Message-Id: {}\".format(email.get(\"Message-Id\", \"None\")))\r\n continue\r\n```\r\nHaving found a message ID that had an error, I ran this command to see the context:\r\n\r\n rg --text --context 20 '44F289B0.000001.02100@SCHWARZE-DWFXMI' ~/gmail.mbox\r\n\r\nThis was for the following error:\r\n```\r\n File \"/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py\", line 102, in get_mbox\r\n message[\"date\"] = get_message_date(email.get(\"Date\"), email.get_from())\r\n File \"/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py\", line 178, in get_message_date\r\n datetime_tuple = email.utils.parsedate_tz(mail_date)\r\n File \"/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py\", line 50, in parsedate_tz\r\n res = _parsedate_tz(data)\r\n File \"/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py\", line 69, in _parsedate_tz\r\n data = data.split()\r\nAttributeError: 'Header' object has no attribute 'split'\r\n```\r\nHere's what I spotted in the `ripgrep` output:\r\n```\r\n177133570:Message-Id: <44F289B0.000001.02100@SCHWARZE-DWFXMI>\r\n177133571-Date: Mon, 28 Aug 2006 08:14:08 +0200 (Westeurop\ufffdische Sommerzeit)\r\n177133572-X-Mailer: IncrediMail (5002253)\r\n```\r\nSo it could it be that `_parsedate_tz` is having trouble with that `Mon, 28 Aug 2006 08:14:08 +0200 (Westeurop\ufffdische Sommerzeit)` string.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 813880401, "label": "WIP: Add Gmail takeout mbox import"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/issues/6#issuecomment-790384087", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/6", "id": 790384087, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MDM4NDA4Nw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-04T07:22:51Z", "updated_at": "2021-03-04T07:22:51Z", "author_association": "MEMBER", "body": "#3 also mentions the conflicting version with other tools.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 821841046, "label": "Upgrade to latest sqlite-utils"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790380839", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5", "id": 790380839, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MDM4MDgzOQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-04T07:17:05Z", "updated_at": "2021-03-04T07:17:05Z", "author_association": "MEMBER", "body": "Looks like you're doing this:\r\n```python\r\n elif message.get_content_type() == \"text/plain\":\r\n body = message.get_payload(decode=True)\r\n```\r\nSo presumably that decodes to a unicode string?\r\n\r\nI imagine the reason the column is a `BLOB` for me is that `sqlite-utils` determines the column type based on the first batch of items - https://github.com/simonw/sqlite-utils/blob/09c3386f55f766b135b6a1c00295646c4ae29bec/sqlite_utils/db.py#L1927-L1928 - and I got unlucky and had something in my first batch that wasn't a unicode string.\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 813880401, "label": "WIP: Add Gmail takeout mbox import"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790379629", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5", "id": 790379629, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MDM3OTYyOQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-04T07:14:41Z", "updated_at": "2021-03-04T07:14:41Z", "author_association": "MEMBER", "body": "Confirmed: removing the `len()` call does not speed things up, so it's reading through the entire file for some other purpose too.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 813880401, "label": "WIP: Add Gmail takeout mbox import"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790378658", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5", "id": 790378658, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MDM3ODY1OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-04T07:12:48Z", "updated_at": "2021-03-04T07:12:48Z", "author_association": "MEMBER", "body": "It looks like the `body` is being loaded into a BLOB column - so in Datasette default it looks like this:\r\n\r\n\"mbox__mbox_emails__753_446_rows\"\r\n\r\nIf I `datasette install datasette-render-binary` and then try again I get this:\r\n\r\n\"mbox__mbox_emails__753_446_rows\"\r\n\r\nIt would be great if we could store the `body` as unicode text instead. May have to do something clever to decode it based on some kind of charset header?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 813880401, "label": "WIP: Add Gmail takeout mbox import"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790373024", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5", "id": 790373024, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MDM3MzAyNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-04T07:01:58Z", "updated_at": "2021-03-04T07:04:06Z", "author_association": "MEMBER", "body": "I got 9 warnings that look like this:\r\n```\r\nErrors: 1\r\nTraceback (most recent call last):\r\n File \"/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py\", line 103, in get_mbox\r\n message[\"date\"] = get_message_date(email.get(\"Date\"), email.get_from())\r\n File \"/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py\", line 167, in get_message_date\r\n datetime_tuple = email.utils.parsedate_tz(mail_date)\r\n File \"/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py\", line 50, in parsedate_tz\r\n res = _parsedate_tz(data)\r\n File \"/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py\", line 69, in _parsedate_tz\r\n data = data.split()\r\nAttributeError: 'Header' object has no attribute 'split'\r\n```\r\nIt would be useful if those warnings told me the message ID (or similar) of the affected message so I could grep for it in the `mbox` and see what was going on.\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 813880401, "label": "WIP: Add Gmail takeout mbox import"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790372621", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5", "id": 790372621, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MDM3MjYyMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-04T07:01:18Z", "updated_at": "2021-03-04T07:01:18Z", "author_association": "MEMBER", "body": "I'm not sure if it would work, but there is an alternative pattern for showing a progress bar against a really large file that I've used in `healthkit-to-sqlite` - you set the progress bar size to the size of the file in bytes, then update a counter as you read the file.\r\n\r\nhttps://github.com/dogsheep/healthkit-to-sqlite/blob/3eb2b06bfe3b4faaf10e9cf9dfcb28e3d16c14ff/healthkit_to_sqlite/cli.py#L24-L57 and https://github.com/dogsheep/healthkit-to-sqlite/blob/3eb2b06bfe3b4faaf10e9cf9dfcb28e3d16c14ff/healthkit_to_sqlite/utils.py#L4-L19 (the `progress_callback()` bit) is where that happens.\r\n\r\nIt can be a bit of a convoluted pattern, and I'm not at all sure it would work for `mbox` files since it looks like that library has other reasons it needs to do a file scan rather than streaming it through one chunk of bytes at a time. So I imagine this would not work here.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 813880401, "label": "WIP: Add Gmail takeout mbox import"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790370485", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5", "id": 790370485, "node_id": "MDEyOklzc3VlQ29tbWVudDc5MDM3MDQ4NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-03-04T06:57:25Z", "updated_at": "2021-03-04T06:57:48Z", "author_association": "MEMBER", "body": "The command takes quite a while to start running, presumably because this line causes it to have to scan the WHOLE file in order to generate a count:\r\n\r\nhttps://github.com/dogsheep/google-takeout-to-sqlite/blob/a3de045eba0fae4b309da21aa3119102b0efc576/google_takeout_to_sqlite/utils.py#L66-L67\r\n\r\nI'm fine with waiting though. It's not like this is a command people run every day - and without that count we can't show a progress bar, which seems pretty important for a process that takes this long.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 813880401, "label": "WIP: Add Gmail takeout mbox import"}, "performed_via_github_app": null}