{"html_url": "https://github.com/simonw/datasette/issues/1836#issuecomment-1271003212", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1836", "id": 1271003212, "node_id": "IC_kwDOBm6k_c5LwfhM", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-07T01:52:04Z", "updated_at": "2022-10-07T01:52:04Z", "author_association": "CONTRIBUTOR", "body": "and if we try immutable mode, which is how things are opened by `datasette inspect` we duplicate the files!!!\r\n\r\n```python\r\n# test_sql_immutable.py\r\nimport sqlite3\r\nimport sys\r\n\r\ndb_name = sys.argv[1]\r\nconn = sqlite3.connect(f'file:/app/{db_name}?immutable=1', uri=True)\r\ncur = conn.cursor()\r\ncur.execute('select count(*) from filing')\r\nprint(cur.fetchone())\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400374908, "label": "docker image is duplicating db files somehow"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1836#issuecomment-1270992795", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1836", "id": 1270992795, "node_id": "IC_kwDOBm6k_c5Lwc-b", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-07T01:29:15Z", "updated_at": "2022-10-07T01:50:14Z", "author_association": "CONTRIBUTOR", "body": "fascinatingly, telling python to open sqlite in read only mode makes this layer have a size of 0\r\n\r\n```python\r\n# test_sql_ro.py\r\nimport sqlite3\r\nimport sys\r\n\r\ndb_name = sys.argv[1]\r\nconn = sqlite3.connect(f'file:/app/{db_name}?mode=ro', uri=True)\r\ncur = conn.cursor()\r\ncur.execute('select count(*) from filing')\r\nprint(cur.fetchone())\r\n```\r\n\r\nthat's quite weird because setting the file permissions to read only didn't do anything. (on reflection, that chmod isn't doing anything because the dockerfile commands are run as root)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400374908, "label": "docker image is duplicating db files somehow"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1836#issuecomment-1270988081", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1836", "id": 1270988081, "node_id": "IC_kwDOBm6k_c5Lwb0x", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-07T01:19:01Z", "updated_at": "2022-10-07T01:27:35Z", "author_association": "CONTRIBUTOR", "body": "okay, some progress!! running some sql against a database file causes that file to get duplicated even if it doesn't apparently change the file.\r\n\r\nmake a little test script like this:\r\n\r\n```python\r\n# test_sql.py\r\nimport sqlite3\r\nimport sys\r\n\r\ndb_name = sys.argv[1]\r\nconn = sqlite3.connect(f'file:/app/{db_name}', uri=True)\r\ncur = conn.cursor()\r\ncur.execute('select count(*) from filing')\r\nprint(cur.fetchone())\r\n```\r\n\r\nthen \r\n\r\n```docker\r\nRUN python test_sql.py nlrb.db\r\n```\r\n\r\nproduced a layer that's the same size as `nlrb.db`!!\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400374908, "label": "docker image is duplicating db files somehow"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1836#issuecomment-1270936982", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1836", "id": 1270936982, "node_id": "IC_kwDOBm6k_c5LwPWW", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-07T00:52:41Z", "updated_at": "2022-10-07T00:52:41Z", "author_association": "CONTRIBUTOR", "body": "it's not that the inspect command is somehow changing the db files. if i set them to only read-only, the \"inspect\" layer still has the same very large size.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400374908, "label": "docker image is duplicating db files somehow"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1836#issuecomment-1270923537", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1836", "id": 1270923537, "node_id": "IC_kwDOBm6k_c5LwMER", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-07T00:46:08Z", "updated_at": "2022-10-07T00:46:08Z", "author_association": "CONTRIBUTOR", "body": "i thought it was maybe to do with reading through all the files, but that does not seem to be the case\r\n\r\nif i make a little test file like:\r\n\r\n```python\r\n# test_read.py\r\nimport hashlib\r\nimport sys\r\nimport pathlib\r\n\r\nHASH_BLOCK_SIZE = 1024 * 1024\r\n\r\ndef inspect_hash(path):\r\n \"\"\"Calculate the hash of a database, efficiently.\"\"\"\r\n m = hashlib.sha256()\r\n with path.open(\"rb\") as fp:\r\n while True:\r\n data = fp.read(HASH_BLOCK_SIZE)\r\n if not data:\r\n break\r\n m.update(data)\r\n\r\n return m.hexdigest()\r\n\r\ninspect_hash(pathlib.Path(sys.argv[1]))\r\n```\r\n\r\nthen a line in the Dockerfile like\r\n\r\n```docker\r\nRUN python test_read.py nlrb.db && echo \"[]\" > /etc/inspect.json\r\n```\r\n\r\njust produes a layer of `3B`\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400374908, "label": "docker image is duplicating db files somehow"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1837#issuecomment-1270855853", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1837", "id": 1270855853, "node_id": "IC_kwDOBm6k_c5Lv7it", "user": {"value": 22429695, "label": "codecov[bot]"}, "created_at": "2022-10-07T00:01:20Z", "updated_at": "2022-10-07T00:01:20Z", "author_association": "NONE", "body": "# [Codecov](https://codecov.io/gh/simonw/datasette/pull/1837?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) Report\nBase: **92.50**% // Head: **92.50**% // No change to project coverage :thumbsup:\n> Coverage data is based on head [(`c12447e`)](https://codecov.io/gh/simonw/datasette/pull/1837?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) compared to base [(`eff1124`)](https://codecov.io/gh/simonw/datasette/commit/eff112498ecc499323c26612d707908831446d25?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison).\n> Patch has no changes to coverable lines.\n\nAdditional details and impacted files
\n\n\n```diff\n@@ Coverage Diff @@\n## main #1837 +/- ##\n=======================================\n Coverage 92.50% 92.50% \n=======================================\n Files 35 35 \n Lines 4400 4400 \n=======================================\n Hits 4070 4070 \n Misses 330 330 \n```\n\n\n\nHelp us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison)\n\n \n\n[:umbrella: View full report at Codecov](https://codecov.io/gh/simonw/datasette/pull/1837?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). \n:loudspeaker: Do you have feedback about the report comment? [Let us know in this issue](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison).\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400431789, "label": "Make hash and size a lazy property"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1835#issuecomment-1270595328", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1835", "id": 1270595328, "node_id": "IC_kwDOBm6k_c5Lu78A", "user": {"value": 22429695, "label": "codecov[bot]"}, "created_at": "2022-10-06T19:42:25Z", "updated_at": "2022-10-06T19:42:25Z", "author_association": "NONE", "body": "# [Codecov](https://codecov.io/gh/simonw/datasette/pull/1835?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) Report\nBase: **91.71**% // Head: **92.50**% // Increases project coverage by **`+0.78%`** :tada:\n> Coverage data is based on head [(`b4b92df`)](https://codecov.io/gh/simonw/datasette/pull/1835?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) compared to base [(`cb1e093`)](https://codecov.io/gh/simonw/datasette/commit/cb1e093fd361b758120aefc1a444df02462389a3?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison).\n> Patch has no changes to coverable lines.\n\nAdditional details and impacted files
\n\n\n```diff\n@@ Coverage Diff @@\n## main #1835 +/- ##\n==========================================\n+ Coverage 91.71% 92.50% +0.78% \n==========================================\n Files 38 35 -3 \n Lines 4754 4400 -354 \n==========================================\n- Hits 4360 4070 -290 \n+ Misses 394 330 -64 \n```\n\n\n| [Impacted Files](https://codecov.io/gh/simonw/datasette/pull/1835?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) | Coverage \u0394 | |\n|---|---|---|\n| [datasette/database.py](https://codecov.io/gh/simonw/datasette/pull/1835/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL2RhdGFiYXNlLnB5) | | |\n| [datasette/utils/shutil\\_backport.py](https://codecov.io/gh/simonw/datasette/pull/1835/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3V0aWxzL3NodXRpbF9iYWNrcG9ydC5weQ==) | | |\n| [datasette/\\_\\_init\\_\\_.py](https://codecov.io/gh/simonw/datasette/pull/1835/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL19faW5pdF9fLnB5) | | |\n| [datasette/views/base.py](https://codecov.io/gh/simonw/datasette/pull/1835/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3ZpZXdzL2Jhc2UucHk=) | `94.75% <0.00%> (+0.01%)` | :arrow_up: |\n\nHelp us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison)\n\n \n\n[:umbrella: View full report at Codecov](https://codecov.io/gh/simonw/datasette/pull/1835?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). \n:loudspeaker: Do you have feedback about the report comment? [Let us know in this issue](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison).\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400121355, "label": "use inspect data for hash and file size"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1835#issuecomment-1270586897", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1835", "id": 1270586897, "node_id": "IC_kwDOBm6k_c5Lu54R", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-06T19:34:00Z", "updated_at": "2022-10-06T19:34:00Z", "author_association": "OWNER", "body": "Wow, great catch! The whole point of inspect data was to avoid this kind of expensive operation on startup so this makes total sense - I had no idea Datasette was still trying to hash a giant file every time the server started.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400121355, "label": "use inspect data for hash and file size"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1480#issuecomment-1269847461", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1480", "id": 1269847461, "node_id": "IC_kwDOBm6k_c5LsFWl", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-06T11:21:49Z", "updated_at": "2022-10-06T11:21:49Z", "author_association": "CONTRIBUTOR", "body": "thanks @simonw, i'll spend a little more time trying to figure out why this isn't working on cloudrun, and then will flip over to fly if i can't.\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1015646369, "label": "Exceeding Cloud Run memory limits when deploying a 4.8G database"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1480#issuecomment-1269275153", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1480", "id": 1269275153, "node_id": "IC_kwDOBm6k_c5Lp5oR", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-06T03:54:33Z", "updated_at": "2022-10-06T03:54:33Z", "author_association": "OWNER", "body": "I've been having success using Fly recently for a project which I thought would be too large for Cloud Run. I wrote about that here:\r\n\r\n- https://simonwillison.net/2022/Sep/5/laion-aesthetics-weeknotes/", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1015646369, "label": "Exceeding Cloud Run memory limits when deploying a 4.8G database"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1480#issuecomment-1268629159", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1480", "id": 1268629159, "node_id": "IC_kwDOBm6k_c5Lnb6n", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-05T16:00:55Z", "updated_at": "2022-10-05T16:00:55Z", "author_association": "CONTRIBUTOR", "body": "as a next step, i'll fetch the docker image from the google registry, and see what memory and disk usage looks like when i run it locally.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1015646369, "label": "Exceeding Cloud Run memory limits when deploying a 4.8G database"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1480#issuecomment-1268613335", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1480", "id": 1268613335, "node_id": "IC_kwDOBm6k_c5LnYDX", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-05T15:45:49Z", "updated_at": "2022-10-05T15:45:49Z", "author_association": "CONTRIBUTOR", "body": "running into this as i continue to grow my labor data warehouse.\r\n\r\nHere a CloudRun PM says the container size should **not** count against memory: https://stackoverflow.com/a/56570717", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1015646369, "label": "Exceeding Cloud Run memory limits when deploying a 4.8G database"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1824#issuecomment-1268398461", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1824", "id": 1268398461, "node_id": "IC_kwDOBm6k_c5Lmjl9", "user": {"value": 562352, "label": "CharlesNepote"}, "created_at": "2022-10-05T12:55:05Z", "updated_at": "2022-10-05T12:55:05Z", "author_association": "NONE", "body": "Here is some working javascript code. There might be better solution, I'm not a JS expert.\r\n```javascript\r\n var show_hide = document.querySelector(\".show-hide-sql > a\");\r\n\r\n // Hide SQL query if the URL opened with #_hide_sql\r\n var hash = window.location.hash;\r\n if(hash === \"#_hide_sql\") {\r\n hide_sql();\r\n }\r\n show_hide.setAttribute(\"href\", \"#\");\r\n show_hide.addEventListener(\"click\", toggle_sql_display);\r\n\r\n function toggle_sql_display() {\r\n if (show_hide.innerText === \"hide\") {\r\n hide_sql();\r\n return;\r\n }\r\n if (show_hide.innerText === \"show\") {\r\n show_sql();\r\n return;\r\n }\r\n }\r\n\r\n function hide_sql() {\r\n sql_element.style.cssText=\"display:none\";\r\n show_hide.innerHTML = \"show\";\r\n show_hide.setAttribute(\"href\", \"#_hide_sql\");\r\n }\r\n\r\n function show_sql() {\r\n sql_element.style.cssText=\"display:block\";\r\n show_hide.innerHTML = \"hide\";\r\n show_hide.setAttribute(\"href\", \"#_show_sql\");\r\n }\r\n```\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1387712501, "label": "Convert &_hide_sql=1 to #_hide_sql"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1823#issuecomment-1258833358", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1823", "id": 1258833358, "node_id": "IC_kwDOBm6k_c5LCEXO", "user": {"value": 22429695, "label": "codecov[bot]"}, "created_at": "2022-09-27T00:54:15Z", "updated_at": "2022-10-05T04:37:54Z", "author_association": "NONE", "body": "# [Codecov](https://codecov.io/gh/simonw/datasette/pull/1823?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) Report\nBase: **91.58**% // Head: **92.50**% // Increases project coverage by **`+0.91%`** :tada:\n> Coverage data is based on head [(`b545b6a`)](https://codecov.io/gh/simonw/datasette/pull/1823?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) compared to base [(`5f9f567`)](https://codecov.io/gh/simonw/datasette/commit/5f9f567acbc58c9fcd88af440e68034510fb5d2b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison).\n> Patch coverage: 90.47% of modified lines in pull request are covered.\n\nAdditional details and impacted files
\n\n\n```diff\n@@ Coverage Diff @@\n## main #1823 +/- ##\n==========================================\n+ Coverage 91.58% 92.50% +0.91% \n==========================================\n Files 36 35 -1 \n Lines 4444 4400 -44 \n==========================================\n Hits 4070 4070 \n+ Misses 374 330 -44 \n```\n\n\n| [Impacted Files](https://codecov.io/gh/simonw/datasette/pull/1823?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) | Coverage \u0394 | |\n|---|---|---|\n| [datasette/utils/asgi.py](https://codecov.io/gh/simonw/datasette/pull/1823/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3V0aWxzL2FzZ2kucHk=) | `91.06% <88.23%> (\u00f8)` | |\n| [datasette/app.py](https://codecov.io/gh/simonw/datasette/pull/1823/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL2FwcC5weQ==) | `94.11% <100.00%> (\u00f8)` | |\n| [datasette/utils/shutil\\_backport.py](https://codecov.io/gh/simonw/datasette/pull/1823/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3V0aWxzL3NodXRpbF9iYWNrcG9ydC5weQ==) | | |\n\nHelp us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison)\n\n \n\n[:umbrella: View full report at Codecov](https://codecov.io/gh/simonw/datasette/pull/1823?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). \n:loudspeaker: Do you have feedback about the report comment? [Let us know in this issue](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison).\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1386917344, "label": "Keyword-only arguments for a bunch of internal methods"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1832#issuecomment-1267925830", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1832", "id": 1267925830, "node_id": "IC_kwDOBm6k_c5LkwNG", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-05T04:31:57Z", "updated_at": "2022-10-05T04:31:57Z", "author_association": "OWNER", "body": "Turns out this already works - `__bool__` falls back on `__len__`: https://docs.python.org/3/reference/datamodel.html#object.__bool__\r\n\r\n> When this method is not defined, [`__len__()`](https://docs.python.org/3/reference/datamodel.html#object.__len__ \"object.__len__\") is called, if it is defined, and the object is considered true if its result is nonzero.\r\n\r\nI'll add a test to demonstrate this.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1397193691, "label": "__bool__ method on Results"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1832#issuecomment-1267918117", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1832", "id": 1267918117, "node_id": "IC_kwDOBm6k_c5LkuUl", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-05T04:19:52Z", "updated_at": "2022-10-05T04:19:52Z", "author_association": "OWNER", "body": "Code can go here:\r\n\r\nhttps://github.com/simonw/datasette/blob/b6ba117b7978b58b40e3c3c2b723b92c3010ed53/datasette/database.py#L511-L515\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1397193691, "label": "__bool__ method on Results"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1829#issuecomment-1267709546", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1829", "id": 1267709546, "node_id": "IC_kwDOBm6k_c5Lj7Zq", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-04T23:19:24Z", "updated_at": "2022-10-04T23:21:07Z", "author_association": "OWNER", "body": "There's also a `check_visibility()` helper which I'm not using in these particular cases but which may be relevant. It's called like this:\r\n\r\nhttps://github.com/simonw/datasette/blob/4218c9cd742b79b1e3cb80878e42b7e39d16ded2/datasette/views/database.py#L65-L77\r\n\r\nAnd is defined here: https://github.com/simonw/datasette/blob/4218c9cd742b79b1e3cb80878e42b7e39d16ded2/datasette/app.py#L694-L710\r\n\r\nIt's actually documented as a public method here: https://docs.datasette.io/en/stable/internals.html#await-check-visibility-actor-action-resource-none\r\n\r\n> This convenience method can be used to answer the question \"should this item be considered private, in that it is visible to me but it is not visible to anonymous users?\"\r\n>\r\n> It returns a tuple of two booleans, `(visible, private)`. `visible` indicates if the actor can see this resource. `private` will be `True` if an anonymous user would not be able to view the resource.\r\n\r\nNote that this documented method cannot actually do the right thing - because it's not being given the multiple permissions that need to be checked in order to completely answer the question.\r\n\r\nSo I probably need to redesign that method a bit.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1396948693, "label": "Table/database that is private due to inherited permissions does not show padlock"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1829#issuecomment-1267708232", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1829", "id": 1267708232, "node_id": "IC_kwDOBm6k_c5Lj7FI", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-04T23:17:36Z", "updated_at": "2022-10-04T23:17:36Z", "author_association": "OWNER", "body": "Here's the relevant code from the table page:\r\n\r\nhttps://github.com/simonw/datasette/blob/4218c9cd742b79b1e3cb80878e42b7e39d16ded2/datasette/views/table.py#L215-L227\r\n\r\nNote how `ensure_permissions()` there takes the table, database and instance into account... but the `private` assignment (used to decide if the padlock should display or not) only considers the `view-table` check.\r\n\r\nHere's the same code for the database page:\r\n\r\nhttps://github.com/simonw/datasette/blob/4218c9cd742b79b1e3cb80878e42b7e39d16ded2/datasette/views/database.py#L139-L141\r\n\r\nAnd for canned query pages:\r\n\r\nhttps://github.com/simonw/datasette/blob/4218c9cd742b79b1e3cb80878e42b7e39d16ded2/datasette/views/database.py#L228-L240\r\n\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1396948693, "label": "Table/database that is private due to inherited permissions does not show padlock"}, "performed_via_github_app": null}
{"html_url": "https://github.com/dogsheep/github-to-sqlite/pull/65#issuecomment-1266141699", "issue_url": "https://api.github.com/repos/dogsheep/github-to-sqlite/issues/65", "id": 1266141699, "node_id": "IC_kwDODFdgUs5Ld8oD", "user": {"value": 231498, "label": "khimaros"}, "created_at": "2022-10-03T22:35:03Z", "updated_at": "2022-10-03T22:35:03Z", "author_association": "NONE", "body": "@simonw rebased against latest, please let me know if i should drop this PR.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 923270900, "label": "basic support for events"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1805#issuecomment-1265161668", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1805", "id": 1265161668, "node_id": "IC_kwDOBm6k_c5LaNXE", "user": {"value": 562352, "label": "CharlesNepote"}, "created_at": "2022-10-03T09:18:05Z", "updated_at": "2022-10-03T09:18:05Z", "author_association": "NONE", "body": "> I'm tempted to add `word-wrap: anywhere` only to links that are know to be longer than a certain threshold.\r\n\r\nMake sense IMHO.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1363552780, "label": "truncate_cells_html does not work for links?"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/485#issuecomment-1264769569", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/485", "id": 1264769569, "node_id": "IC_kwDOBm6k_c5LYtoh", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-03T00:04:42Z", "updated_at": "2022-10-03T00:04:42Z", "author_association": "OWNER", "body": "I love these tips - tools that can compile a simple machine learning model to a SQL query! Would be pretty cool if I could bundle a model in Datasette itself as a big in-memory SQLite SQL query:\r\n\r\n- https://github.com/Chryzanthemum/xgb2sql\r\n- https://github.com/konstantint/SKompiler", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 447469253, "label": "Improvements to table label detection "}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1805#issuecomment-1264753894", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1805", "id": 1264753894, "node_id": "IC_kwDOBm6k_c5LYpzm", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-02T23:02:54Z", "updated_at": "2022-10-02T23:02:54Z", "author_association": "OWNER", "body": "I'm tempted to add `word-wrap: anywhere` only to links that are know to be longer than a certain threshold.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1363552780, "label": "truncate_cells_html does not work for links?"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1805#issuecomment-1264753725", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1805", "id": 1264753725, "node_id": "IC_kwDOBm6k_c5LYpw9", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-02T23:02:17Z", "updated_at": "2022-10-02T23:02:17Z", "author_association": "OWNER", "body": "After reverting `word--wrap anywhere` https://latest.datasette.io/_memory?sql=select+%27https%3A%2F%2Fexample.com%2Faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.jpg%27+as+truncated now looks like this, which isn't as good:\r\n\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1363552780, "label": "truncate_cells_html does not work for links?"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1828#issuecomment-1264753439", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1828", "id": 1264753439, "node_id": "IC_kwDOBm6k_c5LYpsf", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-02T23:01:17Z", "updated_at": "2022-10-02T23:01:17Z", "author_association": "OWNER", "body": "That change deployed and https://github-to-sqlite.dogsheep.net/github/commits now looks like this:\r\n\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1393903845, "label": "word-wrap: anywhere resulting in weird display"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1828#issuecomment-1264738081", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1828", "id": 1264738081, "node_id": "IC_kwDOBm6k_c5LYl8h", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-02T21:34:37Z", "updated_at": "2022-10-02T21:34:37Z", "author_association": "OWNER", "body": "I'm running a build of that demo instance here (takes ~30m) https://github.com/dogsheep/github-to-sqlite/actions/runs/3170164705", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1393903845, "label": "word-wrap: anywhere resulting in weird display"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/485#issuecomment-1264737290", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/485", "id": 1264737290, "node_id": "IC_kwDOBm6k_c5LYlwK", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-02T21:29:59Z", "updated_at": "2022-10-02T21:29:59Z", "author_association": "OWNER", "body": "To clarify: the feature this issue is talking about relates to the way Datasette automatically displays foreign key relationships, for example on this page: https://github-to-sqlite.dogsheep.net/github/commits\r\n\r\n\r\n\r\nEach of those columns is a foreign key to another table. The link text that is displayed there comes from the \"label column\" that has either been configured or automatically detected for that other table.\r\n\r\nI wonder if this could be handled with a tiny machine learning model that's trained to help pick the best label column?\r\n\r\nInputs to that model could include:\r\n\r\n- The names of the columns\r\n- The number of unique values in each column\r\n- The type of each column (or maybe only `TEXT` columns should be considered)\r\n- How many `null` values there are\r\n- Is the column marked as unique?\r\n- What's the average (or median or some other statistic) string length of values in each column?\r\n\r\nOutput would be the most likely label column, or some indicator that no likely candidates had been found.\r\n\r\nMy hunch is that this would be better solved using a few extra heuristics rather than by training a model, but it does feel like an interesting opportunity to experiment with a tiny ML model.\r\n\r\nAsked for tips about this on Twitter: https://twitter.com/simonw/status/1576680930680262658\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 447469253, "label": "Improvements to table label detection "}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1805#issuecomment-1264736537", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1805", "id": 1264736537, "node_id": "IC_kwDOBm6k_c5LYlkZ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-10-02T21:25:37Z", "updated_at": "2022-10-02T21:25:37Z", "author_association": "OWNER", "body": "`word-wrap: anywhere` had some nasty side-effects, removing that:\r\n\r\n- #1828", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1363552780, "label": "truncate_cells_html does not work for links?"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/409#issuecomment-1264223554", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/409", "id": 1264223554, "node_id": "IC_kwDOCGYnMM5LWoVC", "user": {"value": 7908073, "label": "chapmanjacobd"}, "created_at": "2022-10-01T03:42:50Z", "updated_at": "2022-10-01T03:42:50Z", "author_association": "CONTRIBUTOR", "body": "oh weird. it inserts into db2", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1149661489, "label": "`with db:` for transactions"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/409#issuecomment-1264223363", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/409", "id": 1264223363, "node_id": "IC_kwDOCGYnMM5LWoSD", "user": {"value": 7908073, "label": "chapmanjacobd"}, "created_at": "2022-10-01T03:41:45Z", "updated_at": "2022-10-01T03:41:45Z", "author_association": "CONTRIBUTOR", "body": "```\r\npytest xklb/check.py --pdb\r\n\r\nxklb/check.py:11: in test_transaction\r\n assert list(db2[\"t\"].rows) == []\r\nE AssertionError: assert [{'foo': 1}] == []\r\nE + where [{'foo': 1}] = list()\r\nE + where = .rows\r\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\r\n\r\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\r\n> /home/xk/github/xk/lb/xklb/check.py(11)test_transaction()\r\n 9 with db1.conn:\r\n 10 db1[\"t\"].insert({\"foo\": 1})\r\n---> 11 assert list(db2[\"t\"].rows) == []\r\n 12 assert list(db2[\"t\"].rows) == [{\"foo\": 1}]\r\n```\r\n\r\nIt fails because it is already inserted.\r\n\r\nbtw if you put these two lines in you pyproject.toml you can get `ipdb` in pytest\r\n\r\n```\r\n[tool.pytest.ini_options]\r\naddopts = \"--pdbcls=IPython.terminal.debugger:TerminalPdb --ignore=tests/data --capture=tee-sys --log-cli-level=ERROR\"\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1149661489, "label": "`with db:` for transactions"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/493#issuecomment-1264219650", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/493", "id": 1264219650, "node_id": "IC_kwDOCGYnMM5LWnYC", "user": {"value": 7908073, "label": "chapmanjacobd"}, "created_at": "2022-10-01T03:22:50Z", "updated_at": "2022-10-01T03:23:58Z", "author_association": "CONTRIBUTOR", "body": "this is likely what you are looking for: https://stackoverflow.com/a/51076749/697964\r\n\r\nbut yeah I would say just disable smart quotes", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1386562662, "label": "Tiny typographical error in install/uninstall docs"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1827#issuecomment-1263570186", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1827", "id": 1263570186, "node_id": "IC_kwDOBm6k_c5LUI0K", "user": {"value": 22429695, "label": "codecov[bot]"}, "created_at": "2022-09-30T13:22:15Z", "updated_at": "2022-09-30T13:22:15Z", "author_association": "NONE", "body": "# [Codecov](https://codecov.io/gh/simonw/datasette/pull/1827?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) Report\nBase: **92.50**% // Head: **92.50**% // No change to project coverage :thumbsup:\n> Coverage data is based on head [(`1f0c557`)](https://codecov.io/gh/simonw/datasette/pull/1827?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) compared to base [(`34defdc`)](https://codecov.io/gh/simonw/datasette/commit/34defdc10aa293294ca01cfab70780755447e1d7?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison).\n> Patch has no changes to coverable lines.\n\nAdditional details and impacted files
\n\n\n```diff\n@@ Coverage Diff @@\n## main #1827 +/- ##\n=======================================\n Coverage 92.50% 92.50% \n=======================================\n Files 35 35 \n Lines 4400 4400 \n=======================================\n Hits 4070 4070 \n Misses 330 330 \n```\n\n\n\nHelp us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison)\n\n \n\n[:umbrella: View full report at Codecov](https://codecov.io/gh/simonw/datasette/pull/1827?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). \n:loudspeaker: Do you have feedback about the report comment? [Let us know in this issue](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison).\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1392426838, "label": "Bump furo from 2022.9.15 to 2022.9.29"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262920929", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/297", "id": 1262920929, "node_id": "IC_kwDOCGYnMM5LRqTh", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-29T23:06:44Z", "updated_at": "2022-09-29T23:06:44Z", "author_association": "OWNER", "body": "Currently the only other use of `-t` is for this:\r\n```\r\n -t, --table Output as a formatted table\r\n```\r\nSo I think it's OK to use it to mean something slightly different for this command, since `sqlite-utils insert` doesn't do any output of data in any format.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 944846776, "label": "Option for importing CSV data using the SQLite .import mechanism"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262918833", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/297", "id": 1262918833, "node_id": "IC_kwDOCGYnMM5LRpyx", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-29T23:02:52Z", "updated_at": "2022-09-29T23:02:52Z", "author_association": "OWNER", "body": "The other nice thing about having this as a separate command is that I can implement a tiny subset of the overall `sqlite-utils insert` features at first, and then add additional features in subsequent releases.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 944846776, "label": "Option for importing CSV data using the SQLite .import mechanism"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262917059", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/297", "id": 1262917059, "node_id": "IC_kwDOCGYnMM5LRpXD", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-29T22:59:28Z", "updated_at": "2022-09-29T22:59:28Z", "author_association": "OWNER", "body": "I quite like `sqlite-utils fast-csv` - I think it's clear enough what it does, and running `--help` can clarify if needed.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 944846776, "label": "Option for importing CSV data using the SQLite .import mechanism"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262915322", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/297", "id": 1262915322, "node_id": "IC_kwDOCGYnMM5LRo76", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-29T22:57:31Z", "updated_at": "2022-09-29T22:57:42Z", "author_association": "OWNER", "body": "Maybe `sqlite-utils fast-csv` is right? Not entirely clear that's an insert though as opposed to a faster version of in-memory querying in the style of `sqlite-utils memory`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 944846776, "label": "Option for importing CSV data using the SQLite .import mechanism"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262914416", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/297", "id": 1262914416, "node_id": "IC_kwDOCGYnMM5LRotw", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-29T22:56:53Z", "updated_at": "2022-09-29T22:56:53Z", "author_association": "OWNER", "body": "Potential names/designs:\r\n\r\n- `sqlite-utils fast data.db rows rows.csv`\r\n- `sqlite-utils insert-fast data.db rows rows.csv`\r\n- `sqlite-utils fast-csv data.db rows rows.csv`\r\n\r\nOr more interestingly... what if it could accept multiple CSV files to create multiple tables?\r\n\r\n- `sqlite-utils fast data.db rows.csv other.csv`\r\n\r\nWould still need to support creating tables with different names though. Maybe like this:\r\n\r\n- `sqlite-utils fast data.db -t mytable rows.csv -t othertable other.csv`\r\n\r\nI seem to be leaning towards `fast` as the command name, but as a standalone command name it's a bit meaningless - how do we know that's about CSV import and not about fast querying or similar?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 944846776, "label": "Option for importing CSV data using the SQLite .import mechanism"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262913145", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/297", "id": 1262913145, "node_id": "IC_kwDOCGYnMM5LRoZ5", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-29T22:54:13Z", "updated_at": "2022-09-29T22:54:13Z", "author_association": "OWNER", "body": "After reviewing `sqlite-utils insert --help` I'm confident that MOST of these options wouldn't make sense for a \"fast\" moder that just supports CSV and works by piping directly to the `sqlite3` binary:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/d792dad1cf5f16525da81b1e162fb71d469995f3/docs/cli-reference.rst#L251-L279\r\n\r\nI'm going to implement a separate command instead.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 944846776, "label": "Option for importing CSV data using the SQLite .import mechanism"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/370#issuecomment-1261930179", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/370", "id": 1261930179, "node_id": "IC_kwDOBm6k_c5LN4bD", "user": {"value": 72577720, "label": "MichaelTiemannOSC"}, "created_at": "2022-09-29T08:17:46Z", "updated_at": "2022-09-29T08:17:46Z", "author_association": "CONTRIBUTOR", "body": "Just watched this video which demonstrates the integration of *any* webapp into JupyterLab: https://youtu.be/FH1dKKmvFtc\r\n\r\nMaybe this is the answer?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 377155320, "label": "Integration with JupyterLab"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1624#issuecomment-1261194164", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1624", "id": 1261194164, "node_id": "IC_kwDOBm6k_c5LLEu0", "user": {"value": 38532, "label": "palfrey"}, "created_at": "2022-09-28T16:54:22Z", "updated_at": "2022-09-28T16:54:22Z", "author_association": "NONE", "body": "https://github.com/simonw/datasette-cors seems to workaround this", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1122427321, "label": "Index page `/` has no CORS headers"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1062#issuecomment-1260909128", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1062", "id": 1260909128, "node_id": "IC_kwDOBm6k_c5LJ_JI", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-28T13:22:53Z", "updated_at": "2022-09-28T14:09:54Z", "author_association": "CONTRIBUTOR", "body": "if you went this route:\r\n\r\n```python\r\nwith sqlite_timelimit(conn, time_limit_ms):\r\n c.execute(query)\r\n for chunk in c.fetchmany(chunk_size):\r\n yield from chunk\r\n```\r\n\r\nthen `time_limit_ms` would probably have to be greatly extended, because the time spent in the loop will depend on the downstream processing.\r\n\r\ni wonder if this was why you were thinking this feature would need a dedicated connection?\r\n\r\n---\r\n\r\nreading more, there's no real limit i can find on the number of active cursors (or more precisely active prepared statements objects, because sqlite doesn't really have cursors). \r\n\r\nmaybe something like this would be okay?\r\n\r\n```python\r\nwith sqlite_timelimit(conn, time_limit_ms):\r\n c.execute(query)\r\n # step through at least one to evaluate the statement, not sure if this is necessary\r\n yield c.execute.fetchone()\r\nfor chunk in c.fetchmany(chunk_size):\r\n yield from chunk\r\n```\r\n\r\nthis seems quite weird that there's not more of limit of the number of active prepared statements, but i haven't been able to find one.\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 732674148, "label": "Refactor .csv to be an output renderer - and teach register_output_renderer to stream all rows"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1062#issuecomment-1260829829", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1062", "id": 1260829829, "node_id": "IC_kwDOBm6k_c5LJryF", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-28T12:27:19Z", "updated_at": "2022-09-28T12:27:19Z", "author_association": "CONTRIBUTOR", "body": "for teaching `register_output_renderer` to stream it seems like the two options are to\r\n\r\n1. a [nested query technique ](https://github.com/simonw/datasette/issues/526#issuecomment-505162238)to paginate through\r\n2. a fetching model that looks like something\r\n```python\r\nwith sqlite_timelimit(conn, time_limit_ms):\r\n c.execute(query)\r\n for chunk in c.fetchmany(chunk_size):\r\n yield from chunk\r\n```\r\ncurrently `db.execute` is not a generator, so this would probably need a new method?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 732674148, "label": "Refactor .csv to be an output renderer - and teach register_output_renderer to stream all rows"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1826#issuecomment-1260373403", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1826", "id": 1260373403, "node_id": "IC_kwDOBm6k_c5LH8Wb", "user": {"value": 66709385, "label": "pjamargh"}, "created_at": "2022-09-28T04:30:27Z", "updated_at": "2022-09-28T04:30:27Z", "author_association": "NONE", "body": "I'm glad the bug report served some purpose. Frankly I just needed the\nmethod signature, that is why the documentation you mention wasn't read.\n\nOn Tue, Sep 27, 2022, 9:05 PM Simon Willison ***@***.***>\nwrote:\n\n> Though now I notice that the copy right there needs to be updated to\n> reflect the new row parameter to render_cell!\n>\n> \u2014\n> Reply to this email directly, view it on GitHub\n> ,\n> or unsubscribe\n> \n> .\n> You are receiving this because you authored the thread.Message ID:\n> ***@***.***>\n>\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1388631785, "label": "render_cell documentation example doesn't match the method signature"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1825#issuecomment-1260368537", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1825", "id": 1260368537, "node_id": "IC_kwDOBm6k_c5LH7KZ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-28T04:21:18Z", "updated_at": "2022-09-28T04:21:18Z", "author_association": "OWNER", "body": "This is great, thank you very much!\r\n\r\nhttps://datasette--1825.org.readthedocs.build/en/1825/deploying.html#running-datasette-using-openrc", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1388227245, "label": "Add documentation for serving via OpenRC"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1825#issuecomment-1260368122", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1825", "id": 1260368122, "node_id": "IC_kwDOBm6k_c5LH7D6", "user": {"value": 22429695, "label": "codecov[bot]"}, "created_at": "2022-09-28T04:20:28Z", "updated_at": "2022-09-28T04:20:28Z", "author_association": "NONE", "body": "# [Codecov](https://codecov.io/gh/simonw/datasette/pull/1825?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) Report\nBase: **91.58**% // Head: **91.58**% // No change to project coverage :thumbsup:\n> Coverage data is based on head [(`b16eb2f`)](https://codecov.io/gh/simonw/datasette/pull/1825?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) compared to base [(`5f9f567`)](https://codecov.io/gh/simonw/datasette/commit/5f9f567acbc58c9fcd88af440e68034510fb5d2b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison).\n> Patch has no changes to coverable lines.\n\n> :exclamation: Current head b16eb2f differs from pull request most recent head e7e96dc. Consider uploading reports for the commit e7e96dc to get more accurate results\n\nAdditional details and impacted files
\n\n\n```diff\n@@ Coverage Diff @@\n## main #1825 +/- ##\n=======================================\n Coverage 91.58% 91.58% \n=======================================\n Files 36 36 \n Lines 4444 4444 \n=======================================\n Hits 4070 4070 \n Misses 374 374 \n```\n\n\n\nHelp us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison)\n\n \n\n[:umbrella: View full report at Codecov](https://codecov.io/gh/simonw/datasette/pull/1825?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). \n:loudspeaker: Do you have feedback about the report comment? [Let us know in this issue](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison).\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1388227245, "label": "Add documentation for serving via OpenRC"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1826#issuecomment-1260357878", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1826", "id": 1260357878, "node_id": "IC_kwDOBm6k_c5LH4j2", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-28T04:05:45Z", "updated_at": "2022-09-28T04:05:45Z", "author_association": "OWNER", "body": "Though now I notice that the copy right there needs to be updated to reflect the new `row` parameter to `render_cell`!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1388631785, "label": "render_cell documentation example doesn't match the method signature"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1826#issuecomment-1260357583", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1826", "id": 1260357583, "node_id": "IC_kwDOBm6k_c5LH4fP", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-28T04:05:16Z", "updated_at": "2022-09-28T04:05:16Z", "author_association": "OWNER", "body": "This is deliberate. The Datasette plugin system allows you to specify only a subset of the parameters for a hook - in this example, only the `value` is needed so the others can be omitted.\r\n\r\nThere's a note about this at the very top of that documentation page: https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hooks\r\n\r\n> When you implement a plugin hook you can accept any or all of the parameters that are documented as being passed to that hook.\r\n> \r\n> For example, you can implement the `render_cell` plugin hook like this even though the full documented hook signature is `render_cell(value, column, table, database, datasette)`:\r\n> ```python\r\n> @hookimpl\r\n> def render_cell(value, column):\r\n> if column == \"stars\":\r\n> return \"*\" * int(value)\r\n> ```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1388631785, "label": "render_cell documentation example doesn't match the method signature"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1260355224", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1260355224, "node_id": "IC_kwDOBm6k_c5LH36Y", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-28T04:01:25Z", "updated_at": "2022-09-28T04:01:25Z", "author_association": "OWNER", "body": "The ultimate protection against those memory bombs is to support more streaming output formats. Related issues:\r\n\r\n- #1177 \r\n- #1062", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1259718517", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1259718517, "node_id": "IC_kwDOBm6k_c5LFcd1", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-27T16:02:51Z", "updated_at": "2022-09-27T16:04:46Z", "author_association": "CONTRIBUTOR", "body": "i think that `max_returned_rows` **is** a defense mechanism, just not for connection exhaustion. `max_returned_rows` is a defense mechanism against **memory bombs**.\r\n\r\nif you are potentially yielding out hundreds of thousands or even millions of rows, you need to be quite careful about data flow to not run out of memory on the server, or on the client.\r\n\r\nyou have a lot of places in your code that are protective of that right now, but `max_returned_rows` acts as the final backstop.\r\n\r\nso, given that, it makes sense to have removing `max_returned_rows` altogether be a non-goal, but instead allow for for specific codepaths (like streaming csv's) be able to bypass.\r\n\r\nthat could dramatically lower the surface area for a memory-bomb attack.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1259693536", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1259693536, "node_id": "IC_kwDOBm6k_c5LFWXg", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-27T15:42:55Z", "updated_at": "2022-09-27T15:42:55Z", "author_association": "OWNER", "body": "It's interesting to note WHY the time limit works against this so well.\r\n\r\nThe time limit as-implemented looks like this:\r\n\r\nhttps://github.com/simonw/datasette/blob/5f9f567acbc58c9fcd88af440e68034510fb5d2b/datasette/utils/__init__.py#L181-L201\r\n\r\nThe key here is `conn.set_progress_handler(handler, n)` - which specifies that the handler function should be called every `n` SQLite operations.\r\n\r\nThe handler function then checks to see if too much time has transpired and conditionally cancels the query.\r\n\r\nThis also doubles up as a \"maximum number of operations\" guard, which is what's happening when you attempt to fetch an infinite number of rows from an infinite table.\r\n\r\nThat limit code could even be extended to say \"exit the query after either 5s or 50,000,000 operations\".\r\n\r\nI don't think that's necessary though.\r\n\r\nTo be honest I'm having trouble with the idea of dropping `max_returned_rows` mainly because what Datasette does (allow arbitrary untrusted SQL queries) is dangerous, so I've designed in multiple redundant defence-in-depth mechanisms right from the start.", "reactions": "{\"total_count\": 1, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 1, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258910228", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258910228, "node_id": "IC_kwDOBm6k_c5LCXIU", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-27T03:11:07Z", "updated_at": "2022-09-27T03:11:07Z", "author_association": "CONTRIBUTOR", "body": "i think this feature would be safe, as its really only the time limit that can, and imo, should protect against long running queries, as it is pretty easy to make very expensive queries that don't return many rows.\r\n\r\nmoving away from `max_returned_rows` will requires some thinking about:\r\n\r\n1. memory usage and data flows to handle potentially very large result sets\r\n2. how to avoid rendering tens or hundreds of thousands of [html rows](#1655).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258906440", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258906440, "node_id": "IC_kwDOBm6k_c5LCWNI", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-27T03:04:37Z", "updated_at": "2022-09-27T03:04:37Z", "author_association": "OWNER", "body": "It would be really neat if we could explore this idea in a plugin, but I don't think Datasette has plugin hooks in the right place for that at the moment.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258905781", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258905781, "node_id": "IC_kwDOBm6k_c5LCWC1", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-27T03:03:35Z", "updated_at": "2022-09-27T03:03:47Z", "author_association": "OWNER", "body": "Yes good point, the time limit does already protect against that. I've been contemplating a permissioned-users-only relaxation of that time limit too, and I got that idea mixed up with this one in my head.\r\n\r\nOn that basis maybe this feature would be safe after all? Would need to do some testing, but it may be that the existing time limit provides enough protection here already.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258878311", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258878311, "node_id": "IC_kwDOBm6k_c5LCPVn", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-27T02:19:48Z", "updated_at": "2022-09-27T02:19:48Z", "author_association": "CONTRIBUTOR", "body": "this sql query doesn't trip up `maximum_returned_rows` but does timeout\r\n\r\n```sql\r\nwith recursive counter(x) as (\r\n select 0\r\n union\r\n select x + 1 from counter\r\n )\r\n select * from counter LIMIT 10 OFFSET 100000000 \r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258871525", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258871525, "node_id": "IC_kwDOBm6k_c5LCNrl", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-27T02:09:32Z", "updated_at": "2022-09-27T02:14:53Z", "author_association": "CONTRIBUTOR", "body": "thanks @simonw, i learned something i didn't know about sqlite's execution model!\r\n\r\n> Imagine if Datasette CSVs did allow unlimited retrievals. Someone could hit the CSV endpoint for that recursive query and tie up Datasette's SQL connection effectively forever.\r\n\r\nwhy wouldn't the `sqlite_timelimit` guard prevent that?\r\n\r\n--- \r\non my local version which has the code to [turn off truncations for query csv](#1820), `sqlite_timelimit` does protect me.\r\n\r\n![Screenshot 2022-09-26 at 22-14-31 Error 500](https://user-images.githubusercontent.com/536941/192415680-94b32b7f-868f-4b89-8194-5752d45f6009.png)\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258864140", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258864140, "node_id": "IC_kwDOBm6k_c5LCL4M", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-27T01:55:32Z", "updated_at": "2022-09-27T01:55:32Z", "author_association": "OWNER", "body": "That recursive query is a great example of the kind of thing having a maximum row limit protects against.\r\n\r\nImagine if Datasette CSVs did allow unlimited retrievals. Someone could hit the CSV endpoint for that recursive query and tie up Datasette's SQL connection effectively forever.\r\n\r\nEven if this feature becomes a permission-guarded thing we still need to take that case into account.\r\n\r\nAt the very least it would be good if the query could be cancelled if the client disconnects - so if someone accidentally starts an infinite query they can cancel the request and free up the server resources.\r\n\r\nIt might be a good idea to implement a page that shows \"currently running\" queries and allows users with the right permission to terminate them from that page.\r\n\r\nAnother option: a \"limit of last resource\" - either a very high row limit (10,000,000 perhaps) or even a time limit, saying that all queries will be cancelled if they take longer than thirty minutes or similar.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258860845", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258860845, "node_id": "IC_kwDOBm6k_c5LCLEt", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-27T01:48:31Z", "updated_at": "2022-09-27T01:50:01Z", "author_association": "OWNER", "body": "The protection is supposed to be from this line:\r\n```python\r\nrows = cursor.fetchmany(max_returned_rows + 1) \r\n```\r\nBy capping the call to `.fetchman()` at `max_returned_rows + 1` (the `+ 1` is to allow detection of whether or not there is a next page) I'm ensuring that Datasette never attempts to iterate over a huge result set.\r\n\r\nSQLite and the `sqlite3` library seem to handle this correctly. Here's an example:\r\n\r\n```pycon\r\n>>> import sqlite3\r\n>>> conn = sqlite3.connect(\":memory:\")\r\n>>> cursor = conn.execute(\"\"\"\r\n... with recursive counter(x) as (\r\n... select 0\r\n... union\r\n... select x + 1 from counter\r\n... )\r\n... select * from counter\"\"\")\r\n>>> cursor.fetchmany(10)\r\n[(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,), (10,)]\r\n```\r\n`counter` there is an infinitely long table ([see TIL](https://til.simonwillison.net/sqlite/simple-recursive-cte)) - but we can retrieve the first 10 results without going into an infinite loop.\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258849766", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258849766, "node_id": "IC_kwDOBm6k_c5LCIXm", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-27T01:27:03Z", "updated_at": "2022-09-27T01:27:03Z", "author_association": "CONTRIBUTOR", "body": "i agree with that concern! but if i'm understanding the code correctly, `maximum_returned_rows` does not protect against long-running queries in any way.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258846992", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258846992, "node_id": "IC_kwDOBm6k_c5LCHsQ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-27T01:21:41Z", "updated_at": "2022-09-27T01:21:41Z", "author_association": "OWNER", "body": "My main concern here is that public Datasette instances could easily have all of their available database connections consumed by long-running queries - either accidentally or deliberately.\r\n\r\nI do totally understand the need for this feature though. I think it can absolutely make sense provided it's protected by authentication and permissions.\r\n\r\nMaybe even limit the number of concurrent downloads at once such that there's always at least one database connection free for other requests.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1823#issuecomment-1258828705", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1823", "id": 1258828705, "node_id": "IC_kwDOBm6k_c5LCDOh", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-27T00:45:46Z", "updated_at": "2022-09-27T00:45:46Z", "author_association": "OWNER", "body": "Also need to do a bit more of an audit to see if there is anywhere else that this style should be applied.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1386917344, "label": "Keyword-only arguments for a bunch of internal methods"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/pull/1823#issuecomment-1258828509", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1823", "id": 1258828509, "node_id": "IC_kwDOBm6k_c5LCDLd", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-27T00:45:26Z", "updated_at": "2022-09-27T00:45:26Z", "author_association": "OWNER", "body": "I should update the documentation to reflect this change.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1386917344, "label": "Keyword-only arguments for a bunch of internal methods"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1822#issuecomment-1258827688", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1822", "id": 1258827688, "node_id": "IC_kwDOBm6k_c5LCC-o", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-27T00:44:04Z", "updated_at": "2022-09-27T00:44:04Z", "author_association": "OWNER", "body": "I'll do this in a PR.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1386854246, "label": "Switch to keyword-only arguments for a bunch of internal methods"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/1817#issuecomment-1258818028", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1817", "id": 1258818028, "node_id": "IC_kwDOBm6k_c5LCAns", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-27T00:27:53Z", "updated_at": "2022-09-27T00:27:53Z", "author_association": "OWNER", "body": "Made a start on this:\r\n```diff\r\ndiff --git a/datasette/hookspecs.py b/datasette/hookspecs.py\r\nindex 34e19664..fe0971e5 100644\r\n--- a/datasette/hookspecs.py\r\n+++ b/datasette/hookspecs.py\r\n@@ -31,25 +31,29 @@ def prepare_jinja2_environment(env, datasette):\r\n \r\n \r\n @hookspec\r\n-def extra_css_urls(template, database, table, columns, view_name, request, datasette):\r\n+def extra_css_urls(\r\n+ template, database, table, columns, sql, params, view_name, request, datasette\r\n+):\r\n \"\"\"Extra CSS URLs added by this plugin\"\"\"\r\n \r\n \r\n @hookspec\r\n-def extra_js_urls(template, database, table, columns, view_name, request, datasette):\r\n+def extra_js_urls(\r\n+ template, database, table, columns, sql, params, view_name, request, datasette\r\n+):\r\n \"\"\"Extra JavaScript URLs added by this plugin\"\"\"\r\n \r\n \r\n @hookspec\r\n def extra_body_script(\r\n- template, database, table, columns, view_name, request, datasette\r\n+ template, database, table, columns, sql, params, view_name, request, datasette\r\n ):\r\n \"\"\"Extra JavaScript code to be included in