{"html_url": "https://github.com/simonw/datasette/issues/1836#issuecomment-1271003212", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1836", "id": 1271003212, "node_id": "IC_kwDOBm6k_c5LwfhM", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-07T01:52:04Z", "updated_at": "2022-10-07T01:52:04Z", "author_association": "CONTRIBUTOR", "body": "and if we try immutable mode, which is how things are opened by `datasette inspect` we duplicate the files!!!\r\n\r\n```python\r\n# test_sql_immutable.py\r\nimport sqlite3\r\nimport sys\r\n\r\ndb_name = sys.argv[1]\r\nconn = sqlite3.connect(f'file:/app/{db_name}?immutable=1', uri=True)\r\ncur = conn.cursor()\r\ncur.execute('select count(*) from filing')\r\nprint(cur.fetchone())\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400374908, "label": "docker image is duplicating db files somehow"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1836#issuecomment-1270992795", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1836", "id": 1270992795, "node_id": "IC_kwDOBm6k_c5Lwc-b", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-07T01:29:15Z", "updated_at": "2022-10-07T01:50:14Z", "author_association": "CONTRIBUTOR", "body": "fascinatingly, telling python to open sqlite in read only mode makes this layer have a size of 0\r\n\r\n```python\r\n# test_sql_ro.py\r\nimport sqlite3\r\nimport sys\r\n\r\ndb_name = sys.argv[1]\r\nconn = sqlite3.connect(f'file:/app/{db_name}?mode=ro', uri=True)\r\ncur = conn.cursor()\r\ncur.execute('select count(*) from filing')\r\nprint(cur.fetchone())\r\n```\r\n\r\nthat's quite weird because setting the file permissions to read only didn't do anything. (on reflection, that chmod isn't doing anything because the dockerfile commands are run as root)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400374908, "label": "docker image is duplicating db files somehow"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1836#issuecomment-1270988081", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1836", "id": 1270988081, "node_id": "IC_kwDOBm6k_c5Lwb0x", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-07T01:19:01Z", "updated_at": "2022-10-07T01:27:35Z", "author_association": "CONTRIBUTOR", "body": "okay, some progress!! running some sql against a database file causes that file to get duplicated even if it doesn't apparently change the file.\r\n\r\nmake a little test script like this:\r\n\r\n```python\r\n# test_sql.py\r\nimport sqlite3\r\nimport sys\r\n\r\ndb_name = sys.argv[1]\r\nconn = sqlite3.connect(f'file:/app/{db_name}', uri=True)\r\ncur = conn.cursor()\r\ncur.execute('select count(*) from filing')\r\nprint(cur.fetchone())\r\n```\r\n\r\nthen \r\n\r\n```docker\r\nRUN python test_sql.py nlrb.db\r\n```\r\n\r\nproduced a layer that's the same size as `nlrb.db`!!\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400374908, "label": "docker image is duplicating db files somehow"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1836#issuecomment-1270936982", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1836", "id": 1270936982, "node_id": "IC_kwDOBm6k_c5LwPWW", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-07T00:52:41Z", "updated_at": "2022-10-07T00:52:41Z", "author_association": "CONTRIBUTOR", "body": "it's not that the inspect command is somehow changing the db files. if i set them to only read-only, the \"inspect\" layer still has the same very large size.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400374908, "label": "docker image is duplicating db files somehow"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1836#issuecomment-1270923537", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1836", "id": 1270923537, "node_id": "IC_kwDOBm6k_c5LwMER", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-07T00:46:08Z", "updated_at": "2022-10-07T00:46:08Z", "author_association": "CONTRIBUTOR", "body": "i thought it was maybe to do with reading through all the files, but that does not seem to be the case\r\n\r\nif i make a little test file like:\r\n\r\n```python\r\n# test_read.py\r\nimport hashlib\r\nimport sys\r\nimport pathlib\r\n\r\nHASH_BLOCK_SIZE = 1024 * 1024\r\n\r\ndef inspect_hash(path):\r\n \"\"\"Calculate the hash of a database, efficiently.\"\"\"\r\n m = hashlib.sha256()\r\n with path.open(\"rb\") as fp:\r\n while True:\r\n data = fp.read(HASH_BLOCK_SIZE)\r\n if not data:\r\n break\r\n m.update(data)\r\n\r\n return m.hexdigest()\r\n\r\ninspect_hash(pathlib.Path(sys.argv[1]))\r\n```\r\n\r\nthen a line in the Dockerfile like\r\n\r\n```docker\r\nRUN python test_read.py nlrb.db && echo \"[]\" > /etc/inspect.json\r\n```\r\n\r\njust produes a layer of `3B`\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1400374908, "label": "docker image is duplicating db files somehow"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1480#issuecomment-1269847461", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1480", "id": 1269847461, "node_id": "IC_kwDOBm6k_c5LsFWl", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-06T11:21:49Z", "updated_at": "2022-10-06T11:21:49Z", "author_association": "CONTRIBUTOR", "body": "thanks @simonw, i'll spend a little more time trying to figure out why this isn't working on cloudrun, and then will flip over to fly if i can't.\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1015646369, "label": "Exceeding Cloud Run memory limits when deploying a 4.8G database"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1480#issuecomment-1268629159", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1480", "id": 1268629159, "node_id": "IC_kwDOBm6k_c5Lnb6n", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-05T16:00:55Z", "updated_at": "2022-10-05T16:00:55Z", "author_association": "CONTRIBUTOR", "body": "as a next step, i'll fetch the docker image from the google registry, and see what memory and disk usage looks like when i run it locally.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1015646369, "label": "Exceeding Cloud Run memory limits when deploying a 4.8G database"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1480#issuecomment-1268613335", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1480", "id": 1268613335, "node_id": "IC_kwDOBm6k_c5LnYDX", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-10-05T15:45:49Z", "updated_at": "2022-10-05T15:45:49Z", "author_association": "CONTRIBUTOR", "body": "running into this as i continue to grow my labor data warehouse.\r\n\r\nHere a CloudRun PM says the container size should **not** count against memory: https://stackoverflow.com/a/56570717", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1015646369, "label": "Exceeding Cloud Run memory limits when deploying a 4.8G database"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/409#issuecomment-1264223554", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/409", "id": 1264223554, "node_id": "IC_kwDOCGYnMM5LWoVC", "user": {"value": 7908073, "label": "chapmanjacobd"}, "created_at": "2022-10-01T03:42:50Z", "updated_at": "2022-10-01T03:42:50Z", "author_association": "CONTRIBUTOR", "body": "oh weird. it inserts into db2", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1149661489, "label": "`with db:` for transactions"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/409#issuecomment-1264223363", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/409", "id": 1264223363, "node_id": "IC_kwDOCGYnMM5LWoSD", "user": {"value": 7908073, "label": "chapmanjacobd"}, "created_at": "2022-10-01T03:41:45Z", "updated_at": "2022-10-01T03:41:45Z", "author_association": "CONTRIBUTOR", "body": "```\r\npytest xklb/check.py --pdb\r\n\r\nxklb/check.py:11: in test_transaction\r\n assert list(db2[\"t\"].rows) == []\r\nE AssertionError: assert [{'foo': 1}] == []\r\nE + where [{'foo': 1}] = list()\r\nE + where = .rows\r\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\r\n\r\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\r\n> /home/xk/github/xk/lb/xklb/check.py(11)test_transaction()\r\n 9 with db1.conn:\r\n 10 db1[\"t\"].insert({\"foo\": 1})\r\n---> 11 assert list(db2[\"t\"].rows) == []\r\n 12 assert list(db2[\"t\"].rows) == [{\"foo\": 1}]\r\n```\r\n\r\nIt fails because it is already inserted.\r\n\r\nbtw if you put these two lines in you pyproject.toml you can get `ipdb` in pytest\r\n\r\n```\r\n[tool.pytest.ini_options]\r\naddopts = \"--pdbcls=IPython.terminal.debugger:TerminalPdb --ignore=tests/data --capture=tee-sys --log-cli-level=ERROR\"\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1149661489, "label": "`with db:` for transactions"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/493#issuecomment-1264219650", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/493", "id": 1264219650, "node_id": "IC_kwDOCGYnMM5LWnYC", "user": {"value": 7908073, "label": "chapmanjacobd"}, "created_at": "2022-10-01T03:22:50Z", "updated_at": "2022-10-01T03:23:58Z", "author_association": "CONTRIBUTOR", "body": "this is likely what you are looking for: https://stackoverflow.com/a/51076749/697964\r\n\r\nbut yeah I would say just disable smart quotes", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1386562662, "label": "Tiny typographical error in install/uninstall docs"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/370#issuecomment-1261930179", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/370", "id": 1261930179, "node_id": "IC_kwDOBm6k_c5LN4bD", "user": {"value": 72577720, "label": "MichaelTiemannOSC"}, "created_at": "2022-09-29T08:17:46Z", "updated_at": "2022-09-29T08:17:46Z", "author_association": "CONTRIBUTOR", "body": "Just watched this video which demonstrates the integration of *any* webapp into JupyterLab: https://youtu.be/FH1dKKmvFtc\r\n\r\nMaybe this is the answer?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 377155320, "label": "Integration with JupyterLab"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1062#issuecomment-1260909128", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1062", "id": 1260909128, "node_id": "IC_kwDOBm6k_c5LJ_JI", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-28T13:22:53Z", "updated_at": "2022-09-28T14:09:54Z", "author_association": "CONTRIBUTOR", "body": "if you went this route:\r\n\r\n```python\r\nwith sqlite_timelimit(conn, time_limit_ms):\r\n c.execute(query)\r\n for chunk in c.fetchmany(chunk_size):\r\n yield from chunk\r\n```\r\n\r\nthen `time_limit_ms` would probably have to be greatly extended, because the time spent in the loop will depend on the downstream processing.\r\n\r\ni wonder if this was why you were thinking this feature would need a dedicated connection?\r\n\r\n---\r\n\r\nreading more, there's no real limit i can find on the number of active cursors (or more precisely active prepared statements objects, because sqlite doesn't really have cursors). \r\n\r\nmaybe something like this would be okay?\r\n\r\n```python\r\nwith sqlite_timelimit(conn, time_limit_ms):\r\n c.execute(query)\r\n # step through at least one to evaluate the statement, not sure if this is necessary\r\n yield c.execute.fetchone()\r\nfor chunk in c.fetchmany(chunk_size):\r\n yield from chunk\r\n```\r\n\r\nthis seems quite weird that there's not more of limit of the number of active prepared statements, but i haven't been able to find one.\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 732674148, "label": "Refactor .csv to be an output renderer - and teach register_output_renderer to stream all rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1062#issuecomment-1260829829", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1062", "id": 1260829829, "node_id": "IC_kwDOBm6k_c5LJryF", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-28T12:27:19Z", "updated_at": "2022-09-28T12:27:19Z", "author_association": "CONTRIBUTOR", "body": "for teaching `register_output_renderer` to stream it seems like the two options are to\r\n\r\n1. a [nested query technique ](https://github.com/simonw/datasette/issues/526#issuecomment-505162238)to paginate through\r\n2. a fetching model that looks like something\r\n```python\r\nwith sqlite_timelimit(conn, time_limit_ms):\r\n c.execute(query)\r\n for chunk in c.fetchmany(chunk_size):\r\n yield from chunk\r\n```\r\ncurrently `db.execute` is not a generator, so this would probably need a new method?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 732674148, "label": "Refactor .csv to be an output renderer - and teach register_output_renderer to stream all rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1259718517", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1259718517, "node_id": "IC_kwDOBm6k_c5LFcd1", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-27T16:02:51Z", "updated_at": "2022-09-27T16:04:46Z", "author_association": "CONTRIBUTOR", "body": "i think that `max_returned_rows` **is** a defense mechanism, just not for connection exhaustion. `max_returned_rows` is a defense mechanism against **memory bombs**.\r\n\r\nif you are potentially yielding out hundreds of thousands or even millions of rows, you need to be quite careful about data flow to not run out of memory on the server, or on the client.\r\n\r\nyou have a lot of places in your code that are protective of that right now, but `max_returned_rows` acts as the final backstop.\r\n\r\nso, given that, it makes sense to have removing `max_returned_rows` altogether be a non-goal, but instead allow for for specific codepaths (like streaming csv's) be able to bypass.\r\n\r\nthat could dramatically lower the surface area for a memory-bomb attack.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258910228", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258910228, "node_id": "IC_kwDOBm6k_c5LCXIU", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-27T03:11:07Z", "updated_at": "2022-09-27T03:11:07Z", "author_association": "CONTRIBUTOR", "body": "i think this feature would be safe, as its really only the time limit that can, and imo, should protect against long running queries, as it is pretty easy to make very expensive queries that don't return many rows.\r\n\r\nmoving away from `max_returned_rows` will requires some thinking about:\r\n\r\n1. memory usage and data flows to handle potentially very large result sets\r\n2. how to avoid rendering tens or hundreds of thousands of [html rows](#1655).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258878311", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258878311, "node_id": "IC_kwDOBm6k_c5LCPVn", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-27T02:19:48Z", "updated_at": "2022-09-27T02:19:48Z", "author_association": "CONTRIBUTOR", "body": "this sql query doesn't trip up `maximum_returned_rows` but does timeout\r\n\r\n```sql\r\nwith recursive counter(x) as (\r\n select 0\r\n union\r\n select x + 1 from counter\r\n )\r\n select * from counter LIMIT 10 OFFSET 100000000 \r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258871525", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258871525, "node_id": "IC_kwDOBm6k_c5LCNrl", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-27T02:09:32Z", "updated_at": "2022-09-27T02:14:53Z", "author_association": "CONTRIBUTOR", "body": "thanks @simonw, i learned something i didn't know about sqlite's execution model!\r\n\r\n> Imagine if Datasette CSVs did allow unlimited retrievals. Someone could hit the CSV endpoint for that recursive query and tie up Datasette's SQL connection effectively forever.\r\n\r\nwhy wouldn't the `sqlite_timelimit` guard prevent that?\r\n\r\n--- \r\non my local version which has the code to [turn off truncations for query csv](#1820), `sqlite_timelimit` does protect me.\r\n\r\n![Screenshot 2022-09-26 at 22-14-31 Error 500](https://user-images.githubusercontent.com/536941/192415680-94b32b7f-868f-4b89-8194-5752d45f6009.png)\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258849766", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258849766, "node_id": "IC_kwDOBm6k_c5LCIXm", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-27T01:27:03Z", "updated_at": "2022-09-27T01:27:03Z", "author_association": "CONTRIBUTOR", "body": "i agree with that concern! but if i'm understanding the code correctly, `maximum_returned_rows` does not protect against long-running queries in any way.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1820#issuecomment-1258803261", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1820", "id": 1258803261, "node_id": "IC_kwDOBm6k_c5LB9A9", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-27T00:03:09Z", "updated_at": "2022-09-27T00:03:09Z", "author_association": "CONTRIBUTOR", "body": "the pattern in this PR `max_returned_rows` control the maximum rows rendered through html and json, and the csv render bypasses that.\r\n\r\ni think it would be better to have each of these different query renderers have more direct control for how many rows to fetch, instead of relying on the internals of the `execute` method.\r\n\r\ngenerally, users will not want to paginate through tens of thousands of results, but often will want to download a full query as json or as csv. ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1386456717, "label": "[SPIKE] Don't truncate query CSVs"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258712931", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/491", "id": 1258712931, "node_id": "IC_kwDOCGYnMM5LBm9j", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-09-26T22:31:58Z", "updated_at": "2022-09-26T22:31:58Z", "author_association": "CONTRIBUTOR", "body": "Right. The backup command will copy tables completely, but in the case of conflicting table names, the destination gets overwritten silently. That might not be what you want here. ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1383646615, "label": "Ability to merge databases and tables"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258508215", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/491", "id": 1258508215, "node_id": "IC_kwDOCGYnMM5LA0-3", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-09-26T19:22:14Z", "updated_at": "2022-09-26T19:22:14Z", "author_association": "CONTRIBUTOR", "body": "This might be fairly straightforward using SQLite's backup utility: https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.backup\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1383646615, "label": "Ability to merge databases and tables"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258337011", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258337011, "node_id": "IC_kwDOBm6k_c5LALLz", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-26T16:49:48Z", "updated_at": "2022-09-26T16:49:48Z", "author_association": "CONTRIBUTOR", "body": "i think the smallest change that gets close to what i want is to change the behavior so that `max_returned_rows` is not applied in the `execute` method when we are are asking for a csv of query.\r\n\r\nthere are some infelicities for that approach, but i'll make a PR to make it easier to discuss.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258167564", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258167564, "node_id": "IC_kwDOBm6k_c5K_h0M", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-26T14:57:44Z", "updated_at": "2022-09-26T15:08:36Z", "author_association": "CONTRIBUTOR", "body": "reading the database execute method i have a few questions.\r\n\r\nhttps://github.com/simonw/datasette/blob/cb1e093fd361b758120aefc1a444df02462389a3/datasette/database.py#L229-L242\r\n\r\n---\r\nunless i'm missing something (which is very likely!!), the `max_returned_rows` argument doesn't actually offer any protections against running very expensive queries. \r\n\r\nIt's not like adding a `LIMIT max_rows` argument. it make sense that it isn't because, the query could already have an `LIMIT` argument. Doing something like `select * from (query) limit {max_returned_rows}` **might** be protective but wouldn't always.\r\n\r\nInstead the code executes the full original query, and if still has time it fetches out the first `max_rows + 1` rows. \r\n\r\nthis *does* offer some protection of memory exhaustion, as you won't hydrate a huge result set into python (however, there are [data flow patterns](https://github.com/simonw/datasette/issues/1727#issuecomment-1258129113) that could avoid that too)\r\n\r\ngiven the current architecture, i don't see how creating a new connection would be use?\r\n\r\n---\r\n\r\nIf we just removed the `max_return_rows` limitation, then i think most things would be fine **except** for the QueryViews. Right now rendering, just [5000 rows takes a lot of client-side memory](https://github.com/simonw/datasette/issues/1655) so some form of pagination would be required.\r\n\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1655#issuecomment-1258166572", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1655", "id": 1258166572, "node_id": "IC_kwDOBm6k_c5K_hks", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-26T14:57:04Z", "updated_at": "2022-09-26T14:57:04Z", "author_association": "CONTRIBUTOR", "body": "I think that paginating, even in javascript, could be very helpful. Maybe render json or csv into the page and let javascript loading that into the dom?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1163369515, "label": "query result page is using 400mb of browser memory 40x size of html page and 400x size of csv data"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1258129113", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1258129113, "node_id": "IC_kwDOBm6k_c5K_YbZ", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-26T14:30:11Z", "updated_at": "2022-09-26T14:48:31Z", "author_association": "CONTRIBUTOR", "body": "from your analysis, it seems like the GIL is blocking on loading of the data from sqlite to python, (particularly in the `fetchmany` call)\r\n\r\nthis is probably a simplistic idea, but what if you had the python code in the `execute` method iterate over the cursor and yield out rows or small chunks of rows.\r\n\r\nsomething like: \r\n```python\r\n with sqlite_timelimit(conn, time_limit_ms):\r\n try:\r\n cursor = conn.cursor()\r\n cursor.execute(sql, params if params is not None else {})\r\n except:\r\n ...\r\n max_returned_rows = self.ds.max_returned_rows\r\n if max_returned_rows == page_size:\r\n max_returned_rows += 1\r\n if max_returned_rows and truncate:\r\n for i, row in enumerate(cursor):\r\n yield row\r\n if i == max_returned_rows - 1:\r\n break\r\n else:\r\n for row in cursor:\r\n yield row\r\n truncated = False \r\n```\r\n\r\nthis kind of thing works well with a postgres server side cursor, but i'm not sure if it will hold for sqlite. \r\n\r\nyou would still spend about the same amount of time in python and would be contending for the gil, but it would be could be non blocking.\r\n\r\ndepending on the data flow, this could also some benefit for memory. (data stays in more compact sqlite-land until you need it)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1256858763", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/491", "id": 1256858763, "node_id": "IC_kwDOCGYnMM5K6iSL", "user": {"value": 7908073, "label": "chapmanjacobd"}, "created_at": "2022-09-24T04:50:59Z", "updated_at": "2022-09-24T04:52:08Z", "author_association": "CONTRIBUTOR", "body": "Instead of outputting binary data to stdout the interface might be better like this\r\n\r\n```\r\nsqlite-utils merge animals.db cats.db dogs.db\r\n```\r\n\r\nsimilar to `zip`, `ogr2ogr`, etc\r\n\r\nActually I think this might already be possible within `ogr2ogr`. I don't believe spatial data is a requirement though it might add an `ogc_id` column or something\r\n\r\n```\r\ncp cats.db animals.db\r\nogr2ogr -append animals.db dogs.db\r\nogr2ogr -append animals.db another.db\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1383646615, "label": "Ability to merge databases and tables"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1817#issuecomment-1256781274", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1817", "id": 1256781274, "node_id": "IC_kwDOBm6k_c5K6PXa", "user": {"value": 50527, "label": "jefftriplett"}, "created_at": "2022-09-23T22:59:46Z", "updated_at": "2022-09-23T22:59:46Z", "author_association": "CONTRIBUTOR", "body": "While you are adding features, would you be future-proofing your APIs if you switched over some arguments over to keyword-only arguments or would that be too disruptive?\r\n\r\nThinking out loud:\r\n\r\n```\r\nasync def render_template( \r\n self, templates, *, context=None, plugin_context=None, request=None, view_name=None \r\n ): \r\n```\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1384273985, "label": "Expose `sql` and `params` arguments to various plugin hooks"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1254064260", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1254064260, "node_id": "IC_kwDOBm6k_c5Kv4CE", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-21T18:17:04Z", "updated_at": "2022-09-21T18:18:01Z", "author_association": "CONTRIBUTOR", "body": "hi @simonw, this is becoming more of a bother for my [labor data warehouse](https://labordata.bunkum.us/). Is there any research or a spike i could do that would help you investigate this issue?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/433#issuecomment-1252898131", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/433", "id": 1252898131, "node_id": "IC_kwDOCGYnMM5KrbVT", "user": {"value": 7908073, "label": "chapmanjacobd"}, "created_at": "2022-09-20T20:51:21Z", "updated_at": "2022-09-20T20:56:07Z", "author_association": "CONTRIBUTOR", "body": "When I run `reset` it fixes my terminal. I suspect it is related to the progress bar\r\n\r\nhttps://linux.die.net/man/1/reset\r\n\r\n```\r\n950 1s /m/d/03_Downloads \ud83d\udc11 echo $TERM\r\nxterm-kitty\r\n\u2593\u2591\u2592\u2591 /m/d/03_Downloads \ud83c\udf0f kitty -v\r\nkitty 0.26.2 created by Kovid Goyal\r\n$ sqlite-utils insert test.db facility facility-boundary-us-all.csv --csv\r\nblah blah blah (no offense)\r\n$ \r\n$ reset\r\n$ \r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1239034903, "label": "CLI eats my cursor"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1813#issuecomment-1250901367", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1813", "id": 1250901367, "node_id": "IC_kwDOBm6k_c5Kjz13", "user": {"value": 883348, "label": "adipasquale"}, "created_at": "2022-09-19T11:34:45Z", "updated_at": "2022-09-19T11:34:45Z", "author_association": "CONTRIBUTOR", "body": "oh and by writing this I just realized the difference: the URL on fly.io is with a custom SQL command whereas the local one is without. \r\nIt seems that there is no pagination when using custom SQL commands which makes sense\r\n\r\nSorry for this useless issue, maybe this can be useful for someone else / me in the future.\r\n\r\nThanks again for this wonderful project !", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1377811868, "label": "missing next and next_url in JSON responses from an instance deployed on Fly "}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1810#issuecomment-1248204219", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1810", "id": 1248204219, "node_id": "IC_kwDOBm6k_c5KZhW7", "user": {"value": 82988, "label": "psychemedia"}, "created_at": "2022-09-15T14:44:47Z", "updated_at": "2022-09-15T14:46:26Z", "author_association": "CONTRIBUTOR", "body": "A couple+ of possible use case examples:\r\n\r\n- someone has a collection of articles indexed with FTS; they want to publish a simple search tool over the results;\r\n- someone has an image collection and they want to be able to search over description text to return images;\r\n- someone has a set of locations with descriptions, and wants to run a query over places and descriptions and get results as a listing or on a map;\r\n- someone has a set of audio or video files with titles, descriptions and/or transcripts, and wants to be able to search over them and return playable versions of returned items.\r\n\r\nIn many cases, I suspect the raw content will be in one table, but the search table will be a second (eg FTS) table. Generally, the search may be over one or more joined tables, and the results constructed from one or more tables (which may or may not be distinct from the search tables).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1374626873, "label": "Featured table(s) on the homepage"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1685#issuecomment-1237381620", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1685", "id": 1237381620, "node_id": "IC_kwDOBm6k_c5JwPH0", "user": {"value": 49699333, "label": "dependabot[bot]"}, "created_at": "2022-09-05T18:36:47Z", "updated_at": "2022-09-05T18:36:47Z", "author_association": "CONTRIBUTOR", "body": "Looks like jinja2 is no longer updatable, so this is no longer needed.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1180778860, "label": "Update jinja2 requirement from <3.1.0,>=2.10.3 to >=2.10.3,<3.2.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1799#issuecomment-1237381569", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1799", "id": 1237381569, "node_id": "IC_kwDOBm6k_c5JwPHB", "user": {"value": 49699333, "label": "dependabot[bot]"}, "created_at": "2022-09-05T18:36:42Z", "updated_at": "2022-09-05T18:36:42Z", "author_association": "CONTRIBUTOR", "body": "Looks like aiofiles is no longer updatable, so this is no longer needed.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1362242558, "label": "Update aiofiles requirement from <0.9,>=0.4 to >=0.4,<22.2"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/480#issuecomment-1232356302", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/480", "id": 1232356302, "node_id": "IC_kwDOCGYnMM5JdEPO", "user": {"value": 7908073, "label": "chapmanjacobd"}, "created_at": "2022-08-31T01:51:49Z", "updated_at": "2022-08-31T01:51:49Z", "author_association": "CONTRIBUTOR", "body": "Thanks for pointing me to the right place", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1355433619, "label": "search_sql add include_rank option"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/467#issuecomment-1224382336", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/467", "id": 1224382336, "node_id": "IC_kwDOCGYnMM5I-peA", "user": {"value": 50527, "label": "jefftriplett"}, "created_at": "2022-08-23T17:16:13Z", "updated_at": "2022-08-23T17:16:13Z", "author_association": "CONTRIBUTOR", "body": "> Should passing `alter=True` also drop any columns that aren't included in the new table structure?\r\n> \r\n> It could even spot column types that aren't correct and fix those.\r\n> \r\n> Is that consistent with the expectations set by how `alter=True` works elsewhere?\r\n\r\nI would lean towards not dropping them (or making a `drop=True` or `drop_columns=True`or `drop_missing_columns=True`) to work with existing tables easier. \r\n\r\nI do like that sqlite-utils mostly just works with existing tables but it's also nice to add to existing fields in a few cases. \r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1348169997, "label": "Mechanism for ensuring a table has all the columns"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1789#issuecomment-1223347322", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1789", "id": 1223347322, "node_id": "IC_kwDOBm6k_c5I6sx6", "user": {"value": 15178711, "label": "asg017"}, "created_at": "2022-08-23T00:03:20Z", "updated_at": "2022-08-23T00:03:20Z", "author_association": "CONTRIBUTOR", "body": "@simonw to build the extension on ubuntu, you can run:\r\n\r\n```\r\napt-get update && apt-get install libsqlite3-dev gcc\r\ngcc ext.c -fPIC -shared -o ext.so\r\n```\r\n\r\nI'm not the best with Actions, but if you set the cache key to `ext.c`, run those two commands to download dependencies + compile to `ext.so`, then the unit test should pick it up and run it correctly. Let me know if you want me to update the PR with that added", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1344823170, "label": "Add new entrypoint option to `--load-extension`"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1789#issuecomment-1221576460", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1789", "id": 1221576460, "node_id": "IC_kwDOBm6k_c5Iz8cM", "user": {"value": 15178711, "label": "asg017"}, "created_at": "2022-08-21T16:16:42Z", "updated_at": "2022-08-21T16:16:42Z", "author_association": "CONTRIBUTOR", "body": "Rebased, Read the docs failure should now now fixed\r\n\r\nRe docs - ya that's a pretty ambitious page, I'm still not 100% sure what the best practices are/should be... Would be happy to make that page in a future PR", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1344823170, "label": "Add new entrypoint option to `--load-extension`"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1779#issuecomment-1214437408", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1779", "id": 1214437408, "node_id": "IC_kwDOBm6k_c5IYtgg", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-08-14T19:42:58Z", "updated_at": "2022-08-14T19:42:58Z", "author_association": "CONTRIBUTOR", "body": "thanks @simonw!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1334628400, "label": "google cloudrun updated their limits on maxscale based on memory and cpu count"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1779#issuecomment-1210675046", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1779", "id": 1210675046, "node_id": "IC_kwDOBm6k_c5IKW9m", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-08-10T13:28:37Z", "updated_at": "2022-08-10T13:28:37Z", "author_association": "CONTRIBUTOR", "body": "maybe a simpler solution is to set the maxscale to like 2? since datasette is not set up to make use of container scaling anyway?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1334628400, "label": "google cloudrun updated their limits on maxscale based on memory and cpu count"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1191#issuecomment-1200732975", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1191", "id": 1200732975, "node_id": "IC_kwDOBm6k_c5Hkbsv", "user": {"value": 2670795, "label": "brandonrobertz"}, "created_at": "2022-08-01T05:39:27Z", "updated_at": "2022-08-01T05:39:27Z", "author_association": "CONTRIBUTOR", "body": "I've got a URL shortening plugin that I would like to embed on the query page but I'd like avoid capturing the entire `query.html` template. A feature like this would solve it. Where's this at and how can I help?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 787098345, "label": "Ability for plugins to collaborate when adding extra HTML to blocks in default templates"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/456#issuecomment-1190277829", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/456", "id": 1190277829, "node_id": "IC_kwDOCGYnMM5G8jLF", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-07-20T13:19:15Z", "updated_at": "2022-07-20T13:19:15Z", "author_association": "CONTRIBUTOR", "body": "hadley wickham's melt and reshape could be good inspo: http://had.co.nz/reshape/introduction.pdf", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1310243385, "label": "feature request: pivot command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/456#issuecomment-1190272780", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/456", "id": 1190272780, "node_id": "IC_kwDOCGYnMM5G8h8M", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-07-20T13:14:54Z", "updated_at": "2022-07-20T13:14:54Z", "author_association": "CONTRIBUTOR", "body": "for example, i have data on votes that look like this:\r\n\r\n| ballot_id | option_id | choice |\r\n|-|-|-|\r\n| 1 | 1 | 0 | \r\n| 1 | 2 | 1 |\r\n| 1 | 3 | 0 |\r\n| 1 | 4 | 1 |\r\n| 2 | 1 | 1 |\r\n| 2 | 2 | 0 |\r\n| 2 | 3 | 1 |\r\n| 2 | 4 | 0 |\r\n\r\nand i want to reshape from this long form to this wide form:\r\n\r\n| ballot_id | option_id_1 | option_id_2 | option_id_3 | option_id_ 4|\r\n|-|-|-|-| -|\r\n| 1 | 0 | 1 | 0 | 1 | \r\n| 2 | 1 | 0 | 1| 0 | \r\n\r\ni could do such a think like this.\r\n\r\n```sql\r\nselect ballot_id, \r\nsum(choice) filter (where option_id = 1) as option_id_1,\r\nsum(choice) filter (where option_id = 2) as option_id_2,\r\nsum(choice) filter (where option_id = 3) as option_id_3,\r\nsum(choice) filter (where option_id = 4) as option_id_4\r\nfrom vote\r\ngroup by ballot_id\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1310243385, "label": "feature request: pivot command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/423#issuecomment-1189010812", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/423", "id": 1189010812, "node_id": "IC_kwDOCGYnMM5G3t18", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-07-19T12:47:39Z", "updated_at": "2022-07-19T12:47:39Z", "author_association": "CONTRIBUTOR", "body": "just ran into this!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1199158210, "label": ".extract() doesn't set foreign key when extracted columns contain NULL value"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/449#issuecomment-1179579878", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/449", "id": 1179579878, "node_id": "IC_kwDOCGYnMM5GTvXm", "user": {"value": 1690072, "label": "davidleejy"}, "created_at": "2022-07-09T17:41:32Z", "updated_at": "2022-07-09T17:41:50Z", "author_association": "CONTRIBUTOR", "body": "Learnt that the types in Sqlite-utils differ somewhat from those in Sqlite. I've changed my test to account for this difference and the test has passed successfully. I will submit a PR.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1279863844, "label": "Utilities for duplicating tables and creating a table with the results of a query"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/449#issuecomment-1174027079", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/449", "id": 1174027079, "node_id": "IC_kwDOCGYnMM5F-jtH", "user": {"value": 1690072, "label": "davidleejy"}, "created_at": "2022-07-04T17:33:04Z", "updated_at": "2022-07-04T17:48:43Z", "author_association": "CONTRIBUTOR", "body": "I've written the code and test. Would you be able to advise how to compare table columns in a pytest function properly? Experiencing a challenge when comparing columns.\r\n\r\nTest:\r\n```python\r\ndef test_duplicate(fresh_db):\r\n table = fresh_db.create_table(\r\n \"table1\",\r\n {\r\n \"text_col\": str,\r\n \"float_col\": float,\r\n \"int_col\": int,\r\n \"bool_col\": bool,\r\n \"bytes_col\": bytes,\r\n \"datetime_col\": datetime.datetime,\r\n },\r\n )\r\n dt = datetime.datetime.now()\r\n b = bytes('hello world', 'utf-8')\r\n data = {\"text_col\": \"Cleo\", \r\n \"float_col\": 3.14,\r\n \"int_col\": -2,\r\n \"bool_col\": True,\r\n \"bytes_col\": b,\r\n \"datetime_col\": str(dt)}\r\n table1 = fresh_db[\"table1\"]\r\n row_id = table1.insert(data).last_rowid\r\n table1.duplicate('table2')\r\n table2 = fresh_db[\"table2\"]\r\n assert data == table2.get(row_id)\r\n assert table1.columns == table2.columns # FAILS HERE\r\n```\r\n\r\nResult:\r\n![Screenshot 2022-07-05 at 1 31 55 AM](https://user-images.githubusercontent.com/1690072/177198814-daac48c9-5746-49d0-a14a-14fe181c5a2f.png)\r\n\r\nFailure is due to column types being named differently -- e.g. 'FLOAT' vs 'REAL', 'INTEGER' vs 'INT'. How should I go about comparing columns while accounting for equivalent types?\r\n\r\nOr did I miss out something in my duplication code correctly? Here's how I did it: in `db.py`, I've added the following code:\r\n```python\r\nclass Table(Queryable):\r\n [...]\r\n def duplicate(\r\n self, \r\n name_new: str\r\n ) -> \"Table\":\r\n \"\"\"\r\n Duplicate this table in this database.\r\n\r\n :param name_new: Name of new table.\r\n \"\"\"\r\n assert self.exists()\r\n with self.db.conn:\r\n sql = \"CREATE TABLE [{new_table}] AS SELECT * FROM [{table}];\".format(\r\n new_table = name_new,\r\n table = self.name,\r\n )\r\n self.db.execute(sql)\r\n return self.db[name_new]\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1279863844, "label": "Utilities for duplicating tables and creating a table with the results of a query"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1713#issuecomment-1173358747", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1713", "id": 1173358747, "node_id": "IC_kwDOBm6k_c5F8Aib", "user": {"value": 2670795, "label": "brandonrobertz"}, "created_at": "2022-07-04T05:16:35Z", "updated_at": "2022-07-04T05:16:35Z", "author_association": "CONTRIBUTOR", "body": "This feature is pretty important and would be nice if it would be all within Datasette (no separate CLI/deploy required). My workflow now is to basically just copy the result and paste into a Google Sheet, which works, but then it's not discoverable to other journalists browsing the Datasette instance. I started building a plugin similar to [datasette-saved-queries](https://datasette.io/plugins/datasette-saved-queries) but one that maintains its own DB (required if you're working with all immutable DBs), but got bogged down in details.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1203943272, "label": "Datasette feature for publishing snapshots of query results"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1693#issuecomment-1168704157", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1693", "id": 1168704157, "node_id": "IC_kwDOBm6k_c5FqQKd", "user": {"value": 49699333, "label": "dependabot[bot]"}, "created_at": "2022-06-28T13:11:36Z", "updated_at": "2022-06-28T13:11:36Z", "author_association": "CONTRIBUTOR", "body": "Superseded by #1763.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1184850337, "label": "Bump black from 22.1.0 to 22.3.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1753#issuecomment-1163091750", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1753", "id": 1163091750, "node_id": "IC_kwDOBm6k_c5FU18m", "user": {"value": 49699333, "label": "dependabot[bot]"}, "created_at": "2022-06-22T13:22:34Z", "updated_at": "2022-06-22T13:22:34Z", "author_association": "CONTRIBUTOR", "body": "Superseded by #1760.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1261826957, "label": "Bump furo from 2022.4.7 to 2022.6.4.1"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1528#issuecomment-1151887842", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1528", "id": 1151887842, "node_id": "IC_kwDOBm6k_c5EqGni", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-06-10T03:23:08Z", "updated_at": "2022-06-10T03:23:08Z", "author_association": "CONTRIBUTOR", "body": "I just put together a version of this in a plugin: https://github.com/eyeseast/datasette-query-files. Happy to have any feedback.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1060631257, "label": "Add new `\"sql_file\"` key to Canned Queries in metadata?"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1742#issuecomment-1128064864", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1742", "id": 1128064864, "node_id": "IC_kwDOBm6k_c5DPOdg", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-05-16T19:42:13Z", "updated_at": "2022-05-16T19:42:13Z", "author_association": "CONTRIBUTOR", "body": "Just to add a wrinkle here, this loads fine: https://alltheplaces-datasette.fly.dev/alltheplaces/places.geojson?_trace=1\r\n\r\nBut also, this doesn't add any trace data: https://alltheplaces-datasette.fly.dev/alltheplaces/places.json?_trace=1\r\n\r\nWhat am I missing?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1237586379, "label": "?_trace=1 fails with datasette-geojson for some reason"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1742#issuecomment-1128049716", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1742", "id": 1128049716, "node_id": "IC_kwDOBm6k_c5DPKw0", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-05-16T19:24:44Z", "updated_at": "2022-05-16T19:24:44Z", "author_association": "CONTRIBUTOR", "body": "Where is `_trace` getting injected? And is it something a plugin should be able to handle? (If it is, I guess I should handle it in this case.)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1237586379, "label": "?_trace=1 fails with datasette-geojson for some reason"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/741#issuecomment-1125342229", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/741", "id": 1125342229, "node_id": "IC_kwDOBm6k_c5DE1wV", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-05-12T19:21:16Z", "updated_at": "2022-05-12T19:21:16Z", "author_association": "CONTRIBUTOR", "body": "Came here to check if this had been flagged already. Was helping a colleague get something on Cloud Run and had to dig to find `--extra-options=\"--setting sql_time_limit_ms 2500\"`.\r\n\r\nIf I get some time next week, maybe I'll try to tackle it. Would definitely make things easier to be able to do something like this:\r\n\r\n```sh\r\ndatasette publish cloudrun something.db --setting sql_time_limit_ms 2500\r\n```\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 607223136, "label": "Replace \"datasette publish --extra-options\" with \"--setting\""}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1728#issuecomment-1111752676", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1728", "id": 1111752676, "node_id": "IC_kwDOBm6k_c5CQ__k", "user": {"value": 127565, "label": "wragge"}, "created_at": "2022-04-28T05:11:54Z", "updated_at": "2022-04-28T05:11:54Z", "author_association": "CONTRIBUTOR", "body": "And in terms of the bug, yep I agree that option 2 would be the most useful and least frustrating.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1218133366, "label": "Writable canned queries fail with useless non-error against immutable databases"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1728#issuecomment-1111751734", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1728", "id": 1111751734, "node_id": "IC_kwDOBm6k_c5CQ_w2", "user": {"value": 127565, "label": "wragge"}, "created_at": "2022-04-28T05:09:59Z", "updated_at": "2022-04-28T05:09:59Z", "author_association": "CONTRIBUTOR", "body": "Thanks, I'll give it a try!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1218133366, "label": "Writable canned queries fail with useless non-error against immutable databases"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1728#issuecomment-1111712953", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1728", "id": 1111712953, "node_id": "IC_kwDOBm6k_c5CQ2S5", "user": {"value": 127565, "label": "wragge"}, "created_at": "2022-04-28T03:48:36Z", "updated_at": "2022-04-28T03:48:36Z", "author_association": "CONTRIBUTOR", "body": "I don't think that'd work for this project. The db is very big, and my aim was to have an environment where researchers could be making use of the data, but be easily able to add corrections to the HTR/OCR extracted data when they came across problems. It's in its immutable (!) form here: https://sydney-stock-exchange-xqtkxtd5za-ts.a.run.app/stock_exchange/stocks", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1218133366, "label": "Writable canned queries fail with useless non-error against immutable databases"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1728#issuecomment-1111705323", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1728", "id": 1111705323, "node_id": "IC_kwDOBm6k_c5CQ0br", "user": {"value": 127565, "label": "wragge"}, "created_at": "2022-04-28T03:32:06Z", "updated_at": "2022-04-28T03:32:06Z", "author_association": "CONTRIBUTOR", "body": "Ah, that would be it! I have a core set of data which doesn't change to which I want authorised users to be able to submit corrections. I was going to deal with the persistence issue by just grabbing the user corrections at regular intervals and saving to GitHub. I might need to rethink. Thanks!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1218133366, "label": "Writable canned queries fail with useless non-error against immutable databases"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1101#issuecomment-1105642187", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1101", "id": 1105642187, "node_id": "IC_kwDOBm6k_c5B5sLL", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-04-21T18:59:08Z", "updated_at": "2022-04-21T18:59:08Z", "author_association": "CONTRIBUTOR", "body": "Ha! That was your idea (and a good one).\r\n\r\nBut it's probably worth measuring to see what overhead it adds. It did require both passing in the database and making the whole thing `async`. \r\n\r\nJust timing the queries themselves:\r\n\r\n1. [Using `AsGeoJSON(geometry) as geometry`](https://alltheplaces-datasette.fly.dev/alltheplaces?sql=select%0D%0A++id%2C%0D%0A++properties%2C%0D%0A++AsGeoJSON%28geometry%29+as+geometry%2C%0D%0A++spider%0D%0Afrom%0D%0A++places%0D%0Aorder+by%0D%0A++id%0D%0Alimit%0D%0A++1000) takes 10.235 ms\r\n2. [Leaving as binary](https://alltheplaces-datasette.fly.dev/alltheplaces?sql=select%0D%0A++id%2C%0D%0A++properties%2C%0D%0A++geometry%2C%0D%0A++spider%0D%0Afrom%0D%0A++places%0D%0Aorder+by%0D%0A++id%0D%0Alimit%0D%0A++1000) takes 8.63 ms\r\n\r\nLooking at the network panel:\r\n\r\n1. Takes about 200 ms for the `fetch` request\r\n2. Takes about 300 ms\r\n\r\nI'm not sure how best to time the GeoJSON generation, but it would be interesting to check. Maybe I'll write a plugin to add query times to response headers.\r\n\r\nThe other thing to consider with async streaming is that it might be well-suited for a slower response. When I have to get the whole result and send a response in a fixed amount of time, I need the most efficient query possible. If I can hang onto a connection and get things one chunk at a time, maybe it's ok if there's some overhead.\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 749283032, "label": "register_output_renderer() should support streaming data"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1101#issuecomment-1105588651", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1101", "id": 1105588651, "node_id": "IC_kwDOBm6k_c5B5fGr", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-04-21T18:15:39Z", "updated_at": "2022-04-21T18:15:39Z", "author_association": "CONTRIBUTOR", "body": "What if you split rendering and streaming into two things:\r\n\r\n- `render` is a function that returns a response\r\n- `stream` is a function that sends chunks, or yields chunks passed to an ASGI `send` callback\r\n\r\nThat way current plugins still work, and streaming is purely additive. A `stream` function could get a cursor or iterator of rows, instead of a list, so it could more efficiently handle large queries.\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 749283032, "label": "register_output_renderer() should support streaming data"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1713#issuecomment-1103312860", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1713", "id": 1103312860, "node_id": "IC_kwDOBm6k_c5Bwzfc", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-04-20T00:52:19Z", "updated_at": "2022-04-20T00:52:19Z", "author_association": "CONTRIBUTOR", "body": "feels related to #1402 ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1203943272, "label": "Datasette feature for publishing snapshots of query results"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1713#issuecomment-1099540225", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1713", "id": 1099540225, "node_id": "IC_kwDOBm6k_c5BiacB", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-04-14T19:09:57Z", "updated_at": "2022-04-14T19:09:57Z", "author_association": "CONTRIBUTOR", "body": "I wonder if this overlaps with what I outlined in #1605. You could run something like this:\r\n\r\n```sh\r\ndatasette freeze -d exports/\r\naws s3 cp exports/ s3://my-export-bucket/$(date)\r\n```\r\n\r\nAnd maybe that does what you need. Of course, that plugin isn't built yet. But that's the idea.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1203943272, "label": "Datasette feature for publishing snapshots of query results"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1699#issuecomment-1094453751", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1699", "id": 1094453751, "node_id": "IC_kwDOBm6k_c5BPAn3", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-04-11T01:32:12Z", "updated_at": "2022-04-11T01:32:12Z", "author_association": "CONTRIBUTOR", "body": "Was looking through old issues and realized a bunch of this got discussed in #1101 (including by me!), so sorry to rehash all this. Happy to help with whatever piece of it I can. Would be very excited to be able to use format plugins with exports.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1193090967, "label": "Proposal: datasette query"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1699#issuecomment-1092386254", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1699", "id": 1092386254, "node_id": "IC_kwDOBm6k_c5BHH3O", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-04-08T02:39:25Z", "updated_at": "2022-04-08T02:39:25Z", "author_association": "CONTRIBUTOR", "body": "And just to think this through a little more, here's what `stream_geojson` might look like:\r\n\r\n```python\r\nasync def stream_geojson(datasette, columns, rows, database, stream):\r\n db = datasette.get_database(database)\r\n for row in rows:\r\n feature = await row_to_geojson(row, db)\r\n stream.write(feature + \"\\n\") # just assuming newline mode for now\r\n```\r\n\r\nAlternately, that could be an async generator, like this:\r\n\r\n```python\r\nasync def stream_geojson(datasette, columns, rows, database):\r\n db = datasette.get_database(database)\r\n for row in rows:\r\n feature = await row_to_geojson(row, db)\r\n yield feature\r\n```\r\n\r\nNot sure which makes more sense, but I think this pattern would open up a lot of possibility. If you had your [stream_indented_json](https://til.simonwillison.net/python/output-json-array-streaming) function, you could do `yield from stream_indented_json(rows, 2)` and be one your way.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1193090967, "label": "Proposal: datasette query"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1699#issuecomment-1092370880", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1699", "id": 1092370880, "node_id": "IC_kwDOBm6k_c5BHEHA", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-04-08T02:07:40Z", "updated_at": "2022-04-08T02:07:40Z", "author_association": "CONTRIBUTOR", "body": "So maybe `render_output_render` returns something like this:\r\n\r\n```python\r\n@hookimpl\r\ndef register_output_renderer(datasette):\r\n return {\r\n \"extension\": \"geojson\",\r\n \"render\": render_geojson,\r\n \"stream\": stream_geojson,\r\n \"can_render\": can_render_geojson,\r\n }\r\n```\r\n\r\nAnd stream gets an iterator, instead of a list of rows, so it can efficiently handle large queries. Maybe it also gets passed a destination stream, or it returns an iterator. I'm not sure what makes more sense. Either way, that might cover both CLI exports and streaming responses.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1193090967, "label": "Proposal: datasette query"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1699#issuecomment-1092357672", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1699", "id": 1092357672, "node_id": "IC_kwDOBm6k_c5BHA4o", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-04-08T01:39:40Z", "updated_at": "2022-04-08T01:39:40Z", "author_association": "CONTRIBUTOR", "body": "> My best thought on how to differentiate them so far is plugins: if Datasette plugins that provide alternative outputs - like .geojson and .yml and suchlike - also work for the datasette query command that would make a lot of sense to me.\r\n\r\nThat's my thinking, too. It's really the thing I've been wanting since writing `datasette-geojson`, since I'm always exporting with `datasette --get`. The workflow I'm always looking for is something like this:\r\n\r\n```sh\r\ncd alltheplaces-datasette\r\ndatasette query dunkin_in_suffolk -f geojson -o dunkin_in_suffolk.geojson\r\n```\r\n\r\nI think this probably needs either a new plugin hook separate from `register_output_renderer` or a way to use that without going through the HTTP stack. Or maybe a render mode that writes to a stream instead of a response. Maybe there's a new key in the dictionary that `register_output_renderer` returns that handles CLI exports.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1193090967, "label": "Proposal: datasette query"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1549#issuecomment-1087428593", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1549", "id": 1087428593, "node_id": "IC_kwDOBm6k_c5A0Nfx", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-04-04T11:17:13Z", "updated_at": "2022-04-04T11:17:13Z", "author_association": "CONTRIBUTOR", "body": "another way to get the behavior of downloading the file is to use the download attribute of the anchor tag\r\n\r\nhttps://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#attr-download", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1077620955, "label": "Redesign CSV export to improve usability"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1688#issuecomment-1079806857", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1688", "id": 1079806857, "node_id": "IC_kwDOBm6k_c5AXIuJ", "user": {"value": 9020979, "label": "hydrosquall"}, "created_at": "2022-03-27T01:01:14Z", "updated_at": "2022-03-27T01:01:14Z", "author_association": "CONTRIBUTOR", "body": "Thank you! I went through the cookiecutter template, and published my first package here: https://github.com/hydrosquall/datasette-nteract-data-explorer", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1181432624, "label": "[plugins][documentation] Is it possible to serve per-plugin static folders when writing one-off (single file) plugins?"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1688#issuecomment-1079550754", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1688", "id": 1079550754, "node_id": "IC_kwDOBm6k_c5AWKMi", "user": {"value": 9020979, "label": "hydrosquall"}, "created_at": "2022-03-26T01:27:27Z", "updated_at": "2022-03-26T03:16:29Z", "author_association": "CONTRIBUTOR", "body": "> Is there a way to serve a static assets when using the plugins/ directory method instead of installing plugins as a new python package?\r\n\r\nAs a workaround, I found I can serve my statics from a non-plugin specific folder using the [--static](https://docs.datasette.io/en/stable/custom_templates.html#serving-static-files) CLI flag.\r\n\r\n```bash\r\ndatasette ~/Library/Safari/History.db \\\r\n --plugins-dir=plugins/ \\\r\n --static assets:dist/\r\n```\r\n\r\nIt's not ideal because it means I'll change the cache pattern path depending on how the plugin is running (via pip install or as a one off script), but it's usable as a workaround.\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1181432624, "label": "[plugins][documentation] Is it possible to serve per-plugin static folders when writing one-off (single file) plugins?"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1684#issuecomment-1078126065", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1684", "id": 1078126065, "node_id": "IC_kwDOBm6k_c5AQuXx", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-03-24T20:08:56Z", "updated_at": "2022-03-24T20:13:19Z", "author_association": "CONTRIBUTOR", "body": "would be nice if the behavior was\r\n\r\n1. try to facet all the columns\r\n2. for bigger tables try to facet the indexed columns\r\n3. for the biggest tables, turn off autofacetting completely\r\n\r\nThis is based on my assumption that what determines autofaceting is the rarity of unique values. Which may not be true!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1179998071, "label": "Mechanism for disabling faceting on large tables only"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/399#issuecomment-1077671779", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/399", "id": 1077671779, "node_id": "IC_kwDOCGYnMM5AO_dj", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-03-24T14:11:33Z", "updated_at": "2022-03-24T14:11:43Z", "author_association": "CONTRIBUTOR", "body": "Coming back to this. I was about to add a utility function to [datasette-geojson]() to convert lat/lng columns to geometries. Thankfully I googled first. There's a SpatiaLite function for this: [MakePoint](https://www.gaia-gis.it/gaia-sins/spatialite-sql-latest.html#p0).\r\n\r\n```sql\r\nselect MakePoint(longitude, latitude) as geometry from places;\r\n```\r\n\r\nI'm not sure if that would work with `conversions`, since it needs two columns, but it's an option for tables that already have latitude, longitude columns.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1124731464, "label": "Make it easier to insert geometries, with documentation and maybe code"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1581#issuecomment-1077047295", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1581", "id": 1077047295, "node_id": "IC_kwDOBm6k_c5AMm__", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-03-24T04:08:18Z", "updated_at": "2022-03-24T04:08:18Z", "author_association": "CONTRIBUTOR", "body": "this has been addressed by the datasette-hashed-urls plugin", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1089529555, "label": "when hashed urls are turned on, the _memory db has improperly long-lived cache expiry"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1582#issuecomment-1077047152", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1582", "id": 1077047152, "node_id": "IC_kwDOBm6k_c5AMm9w", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-03-24T04:07:58Z", "updated_at": "2022-03-24T04:07:58Z", "author_association": "CONTRIBUTOR", "body": "this has been obviated by the datasette-hashed-urls plugin", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1090055810, "label": "don't set far expiry if hash is '000'"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/131#issuecomment-1067981656", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/131", "id": 1067981656, "node_id": "IC_kwDOCGYnMM4_qBtY", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-03-15T13:21:42Z", "updated_at": "2022-03-15T13:21:42Z", "author_association": "CONTRIBUTOR", "body": "Just ran into this issue last night. I have a big table that's _mostly_ numbers, but also a zip code column in a state where ZIP codes start with 0. Would be great to run something like this:\r\n\r\n```sh\r\nsqlite-utils insert data.db places file.csv --csv --detect-types --type zipcode text\r\n```\r\n\r\nMaybe I'll take a crack at this one.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 675753042, "label": "sqlite-utils insert: options for column types"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1384#issuecomment-1066222323", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1384", "id": 1066222323, "node_id": "IC_kwDOBm6k_c4_jULz", "user": {"value": 2670795, "label": "brandonrobertz"}, "created_at": "2022-03-14T00:36:42Z", "updated_at": "2022-03-14T00:36:42Z", "author_association": "CONTRIBUTOR", "body": "> Ah, sorry, I didn't get what you were saying you the first time. Using _metadata_local in that way makes total sense -- I agree, refreshing metadata each cell was seeming quite excessive. Now I'm on the same page! :)\r\n\r\nAll good. Report back any issues you find with this stuff. Metadata/dynamic config hasn't been tested widely outside of what I've done AFAIK. If you find a strong use case for async meta, it's going to be better to know sooner rather than later!", "reactions": "{\"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 930807135, "label": "Plugin hook for dynamic metadata"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1384#issuecomment-1066169718", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1384", "id": 1066169718, "node_id": "IC_kwDOBm6k_c4_jHV2", "user": {"value": 2670795, "label": "brandonrobertz"}, "created_at": "2022-03-13T19:48:49Z", "updated_at": "2022-03-13T19:48:49Z", "author_association": "CONTRIBUTOR", "body": "> For my reference, did you include a `render_cell` plugin calling `get_metadata` in those tests?\r\n\r\nYou shouldn't need to do this, as I mentioned previously. The code inside `render_cell` hook already has access to the most recently sync'd metadata via `datasette._metadata_local`. Refreshing the metadata for every cell seems ... excessive.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 930807135, "label": "Plugin hook for dynamic metadata"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1384#issuecomment-1066006292", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1384", "id": 1066006292, "node_id": "IC_kwDOBm6k_c4_ifcU", "user": {"value": 2670795, "label": "brandonrobertz"}, "created_at": "2022-03-13T02:09:44Z", "updated_at": "2022-03-13T02:09:44Z", "author_association": "CONTRIBUTOR", "body": "> If I'm understanding your plugin code correctly, you query the db using the sync handle every time `get_metdata` is called, right? Won't this become a pretty big bottleneck if a hook into `render_cell` is trying to read metadata / plugin config?\r\n\r\nReading from sqlite DBs is pretty quick and I didn't notice significant performance issues when I was benchmarking. I tested on very large Datasette deployments (hundreds of DBs, millions of rows). See [\"Many small queries are efficient in sqlite\"](https://sqlite.org/np1queryprob.html) for more information on the rationale here. Also note that in the [datasette-live-config](https://github.com/next-LI/datasette-live-config) reference plugin, the DB connection is cached, so that eliminated most of the performance worries we had.\r\n\r\nIf you need to ensure fresh metadata is being read inside of a `render_cell` hook specifically, you don't need to do anything further! `get_metadata` gets called before `render_cell` every request, so it already has access to the synced meta. There shouldn't be a need to call `get_metadata(...)` or `metadata(...)` inside `render_cell`, you can just use `datasette._metadata_local` if you're really worried about performance.\r\n\r\n> The plugin is close, but looks like it only grabs remote metadata, is that right? Instead what I'm wanting is to grab metadata embedded in the attached databases.\r\n\r\nYes correct, the datadette-remote-metadata plugin doesn't do that. But the datasette-live-config plugin does. [It supports a `__metadata` table](https://github.com/next-LI/datasette-live-config/blob/main/datasette_live_config/__init__.py#L107-L138) that, when it exists on an attached DB, gets pulled into the Datasette internal `_metadata` and is also accessible via `get_metadata`. Updating is instantaneous so there's no gotchas for users or security issues for users relying on the metadata-based permissions. Simon talked about eventually making something like this a standard feature of Datasette, but I'm not sure what the status is on that!\r\n\r\nGood luck!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 930807135, "label": "Plugin hook for dynamic metadata"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1384#issuecomment-1065940779", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1384", "id": 1065940779, "node_id": "IC_kwDOBm6k_c4_iPcr", "user": {"value": 2670795, "label": "brandonrobertz"}, "created_at": "2022-03-12T18:49:29Z", "updated_at": "2022-03-12T18:50:07Z", "author_association": "CONTRIBUTOR", "body": "Hello! Just wanted to chime in and note that there's a plugin to have Datasette [watch for updates to an external metadata.yaml/json and update the internal settings accordingly](https://datasette.io/plugins/datasette-remote-metadata), so I think the cache/poll use case is already covered. @khusmann If you don't need truly dynamic metadata then what you've come up with or the plugin ought to work fine.\r\n\r\nMaking the get_metadata async won't improve the situation by itself as only some of the code paths accessing metadata use that hook. The other paths use the internal metadata dict. Trying to force all paths through a async hook would have performance ramifications and making everything use the internal meta will cause problems for users that need changes to take effect immediately. This is why I came to the non-async solution as it was the path of least change within Datasette. As always, open to new ideas, etc!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 930807135, "label": "Plugin hook for dynamic metadata"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/411#issuecomment-1065477258", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/411", "id": 1065477258, "node_id": "IC_kwDOCGYnMM4_geSK", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-03-11T20:14:59Z", "updated_at": "2022-03-11T20:14:59Z", "author_association": "CONTRIBUTOR", "body": "Good call on adding this to `create-table`, especially for stored columns. Having the stored/virtual split might make this tricky to implement, but I haven't gone any farther than thinking about what the CLI looks like. I'm going to try making the SQL side work first and figure that'll tell me more about what it needs.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1160034488, "label": "Support for generated columns"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1655#issuecomment-1062450649", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1655", "id": 1062450649, "node_id": "IC_kwDOBm6k_c4_U7XZ", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-03-09T01:10:46Z", "updated_at": "2022-03-09T01:10:46Z", "author_association": "CONTRIBUTOR", "body": "i increased the max_returned_row, because I have some scripts that get CSVs from this site, and this makes doing pagination of CSVs less annoying for many cases. i know that's streaming csvs is something you are hoping to address in 1.0. let me know if there's anything i can do to help with that.\r\n\r\nas for what if anything can be done about the size of the dom, I don't have any ideas right now, but i'll poke around.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1163369515, "label": "query result page is using 400mb of browser memory 40x size of html page and 400x size of csv data"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1059647114", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/412", "id": 1059647114, "node_id": "IC_kwDOCGYnMM4_KO6K", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-03-05T01:54:24Z", "updated_at": "2022-03-05T01:54:24Z", "author_association": "CONTRIBUTOR", "body": "I haven't tried this, but it looks like Pandas has a method for this: https://pandas.pydata.org/docs/reference/api/pandas.read_sql_query.html\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1160182768, "label": "Optional Pandas integration"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1641#issuecomment-1049879118", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1641", "id": 1049879118, "node_id": "IC_kwDOBm6k_c4-k-JO", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-02-24T13:49:26Z", "updated_at": "2022-02-24T13:49:26Z", "author_association": "CONTRIBUTOR", "body": "maybe worth considering adding buttons for paren, asterisk, etc. under the input text box on mobile?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1149310456, "label": "Tweak mobile keyboard settings"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/407#issuecomment-1040998433", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/407", "id": 1040998433, "node_id": "IC_kwDOCGYnMM4-DGAh", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-16T01:29:39Z", "updated_at": "2022-02-16T01:29:39Z", "author_association": "CONTRIBUTOR", "body": "Happy to do it and have it in the library. Going to use it a bunch. This whole SpatiaLite toolchain become a huge part of my work in the past year.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1138948786, "label": "Add SpatiaLite helpers to CLI"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/407#issuecomment-1040580250", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/407", "id": 1040580250, "node_id": "IC_kwDOCGYnMM4-Bf6a", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-15T17:40:00Z", "updated_at": "2022-02-15T17:40:00Z", "author_association": "CONTRIBUTOR", "body": "@simonw I think this is ready for a look.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1138948786, "label": "Add SpatiaLite helpers to CLI"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/398#issuecomment-1038336591", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/398", "id": 1038336591, "node_id": "IC_kwDOCGYnMM4948JP", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-13T18:48:21Z", "updated_at": "2022-02-13T18:49:49Z", "author_association": "CONTRIBUTOR", "body": "Been chipping away at this between other things and realized `sqlite-utils init-spatialite` is probably unnecessary. Any of the other commands requires running `db.init_spatialite` to have the extension functions available, and that will do everything `init-spatialite` would do.\r\n\r\nI think it's probably worth keeping a SpatiaLite flag on `create-database` in case you wanted to create all the spatial metadata up front. Otherwise, it's going to get added the first time you run `add-geometry-column` or `create-spatial-index`, which is probably fine in most cases.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1124237013, "label": "Add SpatiaLite helpers to CLI"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1035057014", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1035057014, "node_id": "IC_kwDOCGYnMM49sbd2", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-10T15:30:28Z", "updated_at": "2022-02-10T15:30:40Z", "author_association": "CONTRIBUTOR", "body": "Yeah, the CLI experience is probably where any kind of multi-column, configured setup is going to fall apart. Sticking with GIS examples, one way I might think about this is using the [fiona CLI](https://fiona.readthedocs.io/en/latest/cli.html):\r\n\r\n```sh\r\n# assuming a database is already created and has SpatiaLite\r\nfio cat boundary.shp | sqlite-utils insert boundaries --conversion geometry GeometryGeoJSON -\r\n```\r\n\r\nAnyway, very interested to see where you land here.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/403#issuecomment-1033332570", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/403", "id": 1033332570, "node_id": "IC_kwDOCGYnMM49l2da", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-02-09T04:22:43Z", "updated_at": "2022-02-09T04:22:43Z", "author_association": "CONTRIBUTOR", "body": "dddoooope", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1126692066, "label": "Document how to add a primary key to a rowid table using `sqlite-utils transform --pk`"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1032732242", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1032732242, "node_id": "IC_kwDOCGYnMM49jj5S", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-08T15:26:59Z", "updated_at": "2022-02-08T15:26:59Z", "author_association": "CONTRIBUTOR", "body": "What if you did something like this:\r\n\r\n```python\r\n\r\nclass Conversion:\r\n def __init__(self, *args, **kwargs):\r\n \"Put whatever settings you need here\"\r\n\r\n def python(self, row, column, value): # not sure on args here\r\n \"Python step to transform value\"\r\n return value\r\n\r\n def sql(self, row, column, value):\r\n \"Return the actual sql that goes in the insert/update step, and maybe params\"\r\n # value is the return of self.python()\r\n return value, []\r\n```\r\n\r\nThis way, you're always passing an instance, which has methods that do the conversion. (Or you're passing a SQL string, as you would now.) The `__init__` could take column names, or SRID, or whatever other setup state you need per row, but the row is getting processed with the `python` and `sql` methods (or whatever you want to call them). This is pretty rough, so do what you will with names and args and such.\r\n\r\nYou'd then use it like this:\r\n\r\n```python\r\n# subclass might be unneeded here, if methods are present\r\nclass LngLatConversion(Conversion):\r\n def __init__(self, x=\"longitude\", y=\"latitude\"):\r\n self.x = x\r\n self.y = y\r\n\r\n def python(self, row, column, value):\r\n x = row[self.x]\r\n y = row[self.y]\r\n return x, y\r\n\r\n def sql(self, row, column, value):\r\n # value is now a tuple, returned above\r\n s = \"GeomFromText(POINT(? ?))\"\r\n return s, value\r\n\r\ntable.insert_all(rows, conversions={\"point\": LngLatConversion(\"lng\", \"lat\"))}\r\n```\r\n\r\nI haven't thought through all the implementation details here, and it'll probably break in ways I haven't foreseen, but wanted to get this idea out of my head. Hope it helps.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/403#issuecomment-1032126353", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/403", "id": 1032126353, "node_id": "IC_kwDOCGYnMM49hP-R", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-02-08T01:45:15Z", "updated_at": "2022-02-08T01:45:31Z", "author_association": "CONTRIBUTOR", "body": "you can hack something like this to achieve this result:\r\n\r\n`sqlite-utils convert my_database my_table rowid \"{'id': value}\" --multi`", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1126692066, "label": "Document how to add a primary key to a rowid table using `sqlite-utils transform --pk`"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/26#issuecomment-1032120014", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/26", "id": 1032120014, "node_id": "IC_kwDOCGYnMM49hObO", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-02-08T01:32:34Z", "updated_at": "2022-02-08T01:32:34Z", "author_association": "CONTRIBUTOR", "body": "if you are curious about prior art, https://github.com/jsnell/json-to-multicsv is really good!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 455486286, "label": "Mechanism for turning nested JSON into foreign keys / many-to-many"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1031791783", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1031791783, "node_id": "IC_kwDOCGYnMM49f-Sn", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-07T18:37:40Z", "updated_at": "2022-02-07T18:37:40Z", "author_association": "CONTRIBUTOR", "body": "I've never used it either, but it's interesting, right? Feel like I should try it for something. \r\n\r\nI'm trying to get my head around how this conversions feature might work, because I really like the idea of it.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1031779460", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1031779460, "node_id": "IC_kwDOCGYnMM49f7SE", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-07T18:24:56Z", "updated_at": "2022-02-07T18:24:56Z", "author_association": "CONTRIBUTOR", "body": "I wonder if there's any overlap with the goals here and the `sqlite3` module's concept of adapters and converters: https://docs.python.org/3/library/sqlite3.html#sqlite-and-python-types\r\n\r\nI'm not sure that's _exactly_ what we're talking about here, but it might be a parallel with some useful ideas to borrow.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1593#issuecomment-1031455498", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1593", "id": 1031455498, "node_id": "IC_kwDOBm6k_c49esMK", "user": {"value": 49699333, "label": "dependabot[bot]"}, "created_at": "2022-02-07T13:13:22Z", "updated_at": "2022-02-07T13:13:22Z", "author_association": "CONTRIBUTOR", "body": "Superseded by #1631.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1101705012, "label": "Update pytest-asyncio requirement from <0.17,>=0.10 to >=0.10,<0.18"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/399#issuecomment-1030741289", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/399", "id": 1030741289, "node_id": "IC_kwDOCGYnMM49b90p", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-06T03:03:43Z", "updated_at": "2022-02-06T03:03:43Z", "author_association": "CONTRIBUTOR", "body": "> I wonder if there are any interesting non-geospatial canned conversions that it would be worth including?\r\n\r\nOff the top of my head:\r\n\r\n- Un-nesting JSON objects into columns\r\n- Splitting arrays\r\n- Normalizing dates and times\r\n- URL munging with `urlparse`\r\n- Converting strings to numbers\r\n\r\nSome of this is easy enough with SQL functions, some is easier in Python. Maybe that's where having pre-built classes gets really handy, because it saves you from thinking about which way it's implemented.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1124731464, "label": "Make it easier to insert geometries, with documentation and maybe code"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/399#issuecomment-1030740826", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/399", "id": 1030740826, "node_id": "IC_kwDOCGYnMM49b9ta", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-06T02:59:10Z", "updated_at": "2022-02-06T02:59:10Z", "author_association": "CONTRIBUTOR", "body": "All this said, I don't think it's unreasonable to point people to dedicated tools like `geojson-to-sqlite`. If I'm dealing with a bunch of GeoJSON or Shapefiles, I need to something to read those anyway (or I need to figure out virtual tables). But something like this might make it easier to build those libraries, or standardize the underlying parts.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1124731464, "label": "Make it easier to insert geometries, with documentation and maybe code"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/399#issuecomment-1030740653", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/399", "id": 1030740653, "node_id": "IC_kwDOCGYnMM49b9qt", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-06T02:57:17Z", "updated_at": "2022-02-06T02:57:17Z", "author_association": "CONTRIBUTOR", "body": "I like the idea of having stock conversions you could import. I'd actually move them to a dedicated module (call it `sqlite_utils.conversions` or something), because it's different from other utilities. Maybe they even take configuration, or they're composable.\r\n\r\n```python\r\nfrom sqlite_utils.conversions import LongitudeLatitude\r\n\r\ndb[\"places\"].insert(\r\n {\r\n \"name\": \"London\",\r\n \"lng\": -0.118092,\r\n \"lat\": 51.509865,\r\n },\r\n conversions={\"point\": LongitudeLatitude(\"lng\", \"lat\")},\r\n)\r\n```\r\n\r\nI would definitely use that for every CSV I get with lat/lng columns where I actually need GeoJSON.", "reactions": "{\"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1124731464, "label": "Make it easier to insert geometries, with documentation and maybe code"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/398#issuecomment-1030629879", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/398", "id": 1030629879, "node_id": "IC_kwDOCGYnMM49bin3", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-05T13:57:33Z", "updated_at": "2022-02-05T19:49:38Z", "author_association": "CONTRIBUTOR", "body": "I'm mostly using [geojson-to-sqlite](https://github.com/simonw/geojson-to-sqlite) at the moment. Even with shapefiles, I'm usually converting to GeoJSON and projecting to EPSG:4326 (with [ogr2ogr](https://gdal.org/programs/ogr2ogr.html)) first. \r\n\r\nI think an open question here is how much you want to leave to external libraries and how much you want here. My thinking has been that adding Spatialite helpers here would make external stuff easier, but it would be nice to have some standard way to insert geometries.\r\n\r\nI'm in the middle of adding GeoJSON and Spatialite support to [geocode-sqlite](https://github.com/eyeseast/geocode-sqlite), and that will probably use WKT. Since that's all points, I think I can just make the string inline. But for polygons, I'd generally use Shapely, which probably isn't a dependency you want to add to sqlite-utils.\r\n\r\nI've also been trying to get some of the approaches [here](https://www.gaia-gis.it/fossil/libspatialite/wiki?name=Supporting+GeoJSON) to work, but haven't had any success so far.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1124237013, "label": "Add SpatiaLite helpers to CLI"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/385#issuecomment-1030002502", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/385", "id": 1030002502, "node_id": "IC_kwDOCGYnMM49ZJdG", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-04T13:50:19Z", "updated_at": "2022-02-04T13:50:19Z", "author_association": "CONTRIBUTOR", "body": "Awesome. Thanks for your help getting it in. Will now look at adding CLI versions of this. It's going to be super helpful on a bunch of my projects.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1102899312, "label": "Add new spatialite helper methods"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/385#issuecomment-1029370537", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/385", "id": 1029370537, "node_id": "IC_kwDOCGYnMM49WvKp", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-03T20:25:58Z", "updated_at": "2022-02-03T20:25:58Z", "author_association": "CONTRIBUTOR", "body": "OK, I moved all the GIS helpers into `db.py` as methods on `Database` and `Table`, and I put `find_spatialite` back in `utils.py`. I deleted `gis.py`, since there's nothing left it. Docs and tests are updated and passing.\r\n\r\nI think this is better.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1102899312, "label": "Add new spatialite helper methods"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/385#issuecomment-1029338360", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/385", "id": 1029338360, "node_id": "IC_kwDOCGYnMM49WnT4", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-03T19:43:56Z", "updated_at": "2022-02-03T19:43:56Z", "author_association": "CONTRIBUTOR", "body": "Works for me. I was just looking at how the FTS extensions work and they're just methods, too. So this can be consistent with that.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1102899312, "label": "Add new spatialite helper methods"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/385#issuecomment-1029326568", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/385", "id": 1029326568, "node_id": "IC_kwDOCGYnMM49Wkbo", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-03T19:28:26Z", "updated_at": "2022-02-03T19:28:26Z", "author_association": "CONTRIBUTOR", "body": "> `from sqlite_utils.utils import find_spatialite` is part of the documented API already:\r\n> \r\n> https://sqlite-utils.datasette.io/en/3.22.1/python-api.html#finding-spatialite\r\n> \r\n> To avoid needing to bump the major version number to 4 to indicate a backwards incompatible change, we should keep a `from .gis import find_spatialite` line at the top of `utils.py` such that any existing code with that documented import continues to work.\r\n\r\nThis is fixed now. I had to take out the type annotations for `Database` and `Table` to avoid a circular import, but that's fine and may be moot if these become class methods.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1102899312, "label": "Add new spatialite helper methods"}, "performed_via_github_app": null}