{"html_url": "https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258712931", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/491", "id": 1258712931, "node_id": "IC_kwDOCGYnMM5LBm9j", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-09-26T22:31:58Z", "updated_at": "2022-09-26T22:31:58Z", "author_association": "CONTRIBUTOR", "body": "Right. The backup command will copy tables completely, but in the case of conflicting table names, the destination gets overwritten silently. That might not be what you want here. ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1383646615, "label": "Ability to merge databases and tables"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258508215", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/491", "id": 1258508215, "node_id": "IC_kwDOCGYnMM5LA0-3", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-09-26T19:22:14Z", "updated_at": "2022-09-26T19:22:14Z", "author_association": "CONTRIBUTOR", "body": "This might be fairly straightforward using SQLite's backup utility: https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.backup\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1383646615, "label": "Ability to merge databases and tables"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258337011", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258337011, "node_id": "IC_kwDOBm6k_c5LALLz", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-26T16:49:48Z", "updated_at": "2022-09-26T16:49:48Z", "author_association": "CONTRIBUTOR", "body": "i think the smallest change that gets close to what i want is to change the behavior so that `max_returned_rows` is not applied in the `execute` method when we are are asking for a csv of query.\r\n\r\nthere are some infelicities for that approach, but i'll make a PR to make it easier to discuss.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/526#issuecomment-1258167564", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/526", "id": 1258167564, "node_id": "IC_kwDOBm6k_c5K_h0M", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-26T14:57:44Z", "updated_at": "2022-09-26T15:08:36Z", "author_association": "CONTRIBUTOR", "body": "reading the database execute method i have a few questions.\r\n\r\nhttps://github.com/simonw/datasette/blob/cb1e093fd361b758120aefc1a444df02462389a3/datasette/database.py#L229-L242\r\n\r\n---\r\nunless i'm missing something (which is very likely!!), the `max_returned_rows` argument doesn't actually offer any protections against running very expensive queries. \r\n\r\nIt's not like adding a `LIMIT max_rows` argument. it make sense that it isn't because, the query could already have an `LIMIT` argument. Doing something like `select * from (query) limit {max_returned_rows}` **might** be protective but wouldn't always.\r\n\r\nInstead the code executes the full original query, and if still has time it fetches out the first `max_rows + 1` rows. \r\n\r\nthis *does* offer some protection of memory exhaustion, as you won't hydrate a huge result set into python (however, there are [data flow patterns](https://github.com/simonw/datasette/issues/1727#issuecomment-1258129113) that could avoid that too)\r\n\r\ngiven the current architecture, i don't see how creating a new connection would be use?\r\n\r\n---\r\n\r\nIf we just removed the `max_return_rows` limitation, then i think most things would be fine **except** for the QueryViews. Right now rendering, just [5000 rows takes a lot of client-side memory](https://github.com/simonw/datasette/issues/1655) so some form of pagination would be required.\r\n\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 459882902, "label": "Stream all results for arbitrary SQL and canned queries"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1655#issuecomment-1258166572", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1655", "id": 1258166572, "node_id": "IC_kwDOBm6k_c5K_hks", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-26T14:57:04Z", "updated_at": "2022-09-26T14:57:04Z", "author_association": "CONTRIBUTOR", "body": "I think that paginating, even in javascript, could be very helpful. Maybe render json or csv into the page and let javascript loading that into the dom?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1163369515, "label": "query result page is using 400mb of browser memory 40x size of html page and 400x size of csv data"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1727#issuecomment-1258129113", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1727", "id": 1258129113, "node_id": "IC_kwDOBm6k_c5K_YbZ", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-09-26T14:30:11Z", "updated_at": "2022-09-26T14:48:31Z", "author_association": "CONTRIBUTOR", "body": "from your analysis, it seems like the GIL is blocking on loading of the data from sqlite to python, (particularly in the `fetchmany` call)\r\n\r\nthis is probably a simplistic idea, but what if you had the python code in the `execute` method iterate over the cursor and yield out rows or small chunks of rows.\r\n\r\nsomething like: \r\n```python\r\n with sqlite_timelimit(conn, time_limit_ms):\r\n try:\r\n cursor = conn.cursor()\r\n cursor.execute(sql, params if params is not None else {})\r\n except:\r\n ...\r\n max_returned_rows = self.ds.max_returned_rows\r\n if max_returned_rows == page_size:\r\n max_returned_rows += 1\r\n if max_returned_rows and truncate:\r\n for i, row in enumerate(cursor):\r\n yield row\r\n if i == max_returned_rows - 1:\r\n break\r\n else:\r\n for row in cursor:\r\n yield row\r\n truncated = False \r\n```\r\n\r\nthis kind of thing works well with a postgres server side cursor, but i'm not sure if it will hold for sqlite. \r\n\r\nyou would still spend about the same amount of time in python and would be contending for the gil, but it would be could be non blocking.\r\n\r\ndepending on the data flow, this could also some benefit for memory. (data stays in more compact sqlite-land until you need it)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1217759117, "label": "Research: demonstrate if parallel SQL queries are worthwhile"}, "performed_via_github_app": null}