{"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008234293", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008234293, "node_id": "IC_kwDOCGYnMM48GG81", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T05:37:02Z", "updated_at": "2022-01-09T05:37:02Z", "author_association": "OWNER", "body": "Calling `p.stdin.close()` and then `p.wait()` terminates the subprocess.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008233910", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008233910, "node_id": "IC_kwDOCGYnMM48GG22", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T05:32:53Z", "updated_at": "2022-01-09T05:35:45Z", "author_association": "OWNER", "body": "This is strange. The following:\r\n```pycon\r\n>>> import subprocess\r\n>>> p = subprocess.Popen([\"sqlite-utils\", \"insert\", \"/tmp/stream.db\", \"stream\", \"-\", \"--nl\"], stdin=subprocess.PIPE)\r\n>>> p.stdin.write(b'\\n'.join(b'{\"id\": %s}' % str(i).encode(\"utf-8\") for i in range(1000)))\r\n11889\r\n>>> # At this point /tmp/stream.db is still 0 bytes - but if I then run this:\r\n>>> p.stdin.close()\r\n>>> # /tmp/stream.db is now 20K and contains the written data\r\n```\r\nNo wait, mystery solved - I can add `p.stdin.flush()` instead of `p.stdin.close()` and the file suddenly jumps up in size.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008232075", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/369", "id": 1008232075, "node_id": "IC_kwDOCGYnMM48GGaL", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T05:13:15Z", "updated_at": "2022-01-09T05:13:56Z", "author_association": "OWNER", "body": "I think the query that will help solve this is:\r\n\r\n`explain query plan select * from ny_times_us_counties where state = 1 and county = 2`\r\n\r\nIn this case, the query planner needs to decide if it should use the index for the `state` column or the index for the `county` column. That's where the statistics come into play. In particular:\r\n\r\n| tbl | idx | stat |\r\n|----------------------|---------------------------------|---------------|\r\n| ny_times_us_counties | idx_ny_times_us_counties_date | 2092871 2915 |\r\n| ny_times_us_counties | idx_ny_times_us_counties_fips | 2092871 651 |\r\n| ny_times_us_counties | idx_ny_times_us_counties_county | 2092871 1085 |\r\n| ny_times_us_counties | idx_ny_times_us_counties_state | 2092871 37373 |\r\n\r\nThose numbers are explained by this comment in the SQLite C code: https://github.com/sqlite/sqlite/blob/5622c7f97106314719740098cf0854e7eaa81802/src/analyze.c#L41-L55\r\n\r\n```\r\n** There is normally one row per index, with the index identified by the\r\n** name in the idx column. The tbl column is the name of the table to\r\n** which the index belongs. In each such row, the stat column will be\r\n** a string consisting of a list of integers. The first integer in this\r\n** list is the number of rows in the index. (This is the same as the\r\n** number of rows in the table, except for partial indices.) The second\r\n** integer is the average number of rows in the index that have the same\r\n** value in the first column of the index.\r\n```\r\nSo that table is telling us that using a value in the `county` column will filter down to an average of 1,085 rows, whereas filtering on the `state` column will filter down to an average of 37,373 - so clearly the `county` index is the better index to use here!\r\n\r\nJust one catch: against both my` covid.db` and my `covid-analyzed.db` databases the `county` index is picked for both of them - so SQLite is somehow guessing that `county` is a better index even though it doesn't have statistics for that.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097091527, "label": "Research how much of a difference analyze / sqlite_stat1 makes"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008229839", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008229839, "node_id": "IC_kwDOCGYnMM48GF3P", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T04:51:44Z", "updated_at": "2022-01-09T04:51:44Z", "author_association": "OWNER", "body": "Found one report on Stack Overflow from 9 years ago of someone seeing broken performance after running `ANALYZE`, hard to say that's a trend and not a single weird edge-case though! https://stackoverflow.com/q/12947214/6083", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008229341", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/369", "id": 1008229341, "node_id": "IC_kwDOCGYnMM48GFvd", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T04:45:38Z", "updated_at": "2022-01-09T04:47:11Z", "author_association": "OWNER", "body": "This is probably too fancy. I think maybe the way to do this is with `select * from [global-power-plants] where \"country_long\" = 'United Kingdom'` - then mess around with stats to see if I can get it to use the index or not based on them.\r\n\r\nHere's the explain for that: https://global-power-plants.datasettes.com/global-power-plants?sql=EXPLAIN+QUERY+PLAN+select+*+from+[global-power-plants]+where+%22country_long%22+%3D+%27United+Kingdom%27", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097091527, "label": "Research how much of a difference analyze / sqlite_stat1 makes"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008227625", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/369", "id": 1008227625, "node_id": "IC_kwDOCGYnMM48GFUp", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T04:25:38Z", "updated_at": "2022-01-09T04:25:38Z", "author_association": "OWNER", "body": "```sql\r\nEXPLAIN QUERY PLAN select country_long, count(*) from [global-power-plants] group by country_long\r\n```\r\nhttps://global-power-plants.datasettes.com/global-power-plants?sql=EXPLAIN+QUERY+PLAN+select+country_long%2C+count%28*%29+from+%5Bglobal-power-plants%5D+group+by+country_long\r\n\r\n> SCAN TABLE global-power-plants USING COVERING INDEX \"global-power-plants_country_long\"", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097091527, "label": "Research how much of a difference analyze / sqlite_stat1 makes"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1588#issuecomment-1008227436", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1588", "id": 1008227436, "node_id": "IC_kwDOBm6k_c48GFRs", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T04:23:37Z", "updated_at": "2022-01-09T04:25:04Z", "author_association": "OWNER", "body": "Relevant code: https://github.com/simonw/datasette/blob/85849935292e500ab7a99f8fe0f9546e903baad3/datasette/utils/__init__.py#L163-L170\r\n\r\nhttps://github.com/simonw/datasette/blob/85849935292e500ab7a99f8fe0f9546e903baad3/datasette/utils/__init__.py#L195-L204", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097101917, "label": "`explain query plan select` is too strict about whitespace"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1588#issuecomment-1008227491", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1588", "id": 1008227491, "node_id": "IC_kwDOBm6k_c48GFSj", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T04:24:09Z", "updated_at": "2022-01-09T04:24:09Z", "author_association": "OWNER", "body": "I think this is the fix:\r\n```python\r\nre.compile(r\"^explain\\s+query\\s+plan\\s+select\\b\"),\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097101917, "label": "`explain query plan select` is too strict about whitespace"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008226862", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/369", "id": 1008226862, "node_id": "IC_kwDOCGYnMM48GFIu", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T04:17:55Z", "updated_at": "2022-01-09T04:17:55Z", "author_association": "OWNER", "body": "There are some clues as to what effect ANALYZE has in https://www.sqlite.org/optoverview.html\r\n\r\nSome quotes:\r\n\r\n> SQLite might use a skip-scan on an index if it knows that the first one or more columns contain many duplication values. If there are too few duplicates in the left-most columns of the index, then it would be faster to simply step ahead to the next value, and thus do a full table scan, than to do a binary search on an index to locate the next left-column value.\r\n>\r\n> The only way that SQLite can know that there are many duplicates in the left-most columns of an index is if the ANALYZE command has been run on the database. Without the results of ANALYZE, SQLite has to guess at the \"shape\" of the data in the table, and the default guess is that there are an average of 10 duplicates for every value in the left-most column of the index. Skip-scan only becomes profitable (it only gets to be faster than a full table scan) when the number of duplicates is about 18 or more. Hence, a skip-scan is never used on a database that has not been analyzed. \r\n\r\nAnd\r\n\r\n> Join reordering is automatic and usually works well enough that programmers do not have to think about it, especially if ANALYZE has been used to gather statistics about the available indexes, though occasionally some hints from the programmer are needed.\r\n\r\nAnd\r\n\r\n> The various sqlite_statN tables contain information on how selective the various indexes are. For example, the sqlite_stat1 table might indicate that an equality constraint on column x reduces the search space to 10 rows on average, whereas an equality constraint on column y reduces the search space to 3 rows on average. In that case, SQLite would prefer to use index ex2i2 since that index is more selective. ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097091527, "label": "Research how much of a difference analyze / sqlite_stat1 makes"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008226487", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/369", "id": 1008226487, "node_id": "IC_kwDOCGYnMM48GFC3", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T04:14:05Z", "updated_at": "2022-01-09T04:14:05Z", "author_association": "OWNER", "body": "Didn't manage to spot a meaningful difference with that database either:\r\n```\r\nanalyze % python3 -m timeit '__import__(\"sqlite3\").connect(\"covid.db\").execute(\"select fips, count(*) from [ny_times_us_counties] group by fips\").fetchall()' \r\n2 loops, best of 5: 101 msec per loop\r\nanalyze % python3 -m timeit '__import__(\"sqlite3\").connect(\"covid-analyzed.db\").execute(\"select fips, count(*) from [ny_times_us_counties] group by fips\").fetchall()'\r\n2 loops, best of 5: 103 msec per loop\r\n```\r\nMaybe `select fips, count(*) from [ny_times_us_counties] group by fips` isn't a good query for testing this?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097091527, "label": "Research how much of a difference analyze / sqlite_stat1 makes"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008220270", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/369", "id": 1008220270, "node_id": "IC_kwDOCGYnMM48GDhu", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T03:12:38Z", "updated_at": "2022-01-09T03:13:15Z", "author_association": "OWNER", "body": "Basically no difference using this very basic benchmark:\r\n```\r\nanalyze % python3 -m timeit '__import__(\"sqlite3\").connect(\"global-power-plants.db\").execute(\"select country_long, count(*) from [global-power-plants] group by country_long\").fetchall()'\r\n100 loops, best of 5: 2.39 msec per loop\r\nanalyze % python3 -m timeit '__import__(\"sqlite3\").connect(\"global-power-plants-analyzed.db\").execute(\"select country_long, count(*) from [global-power-plants] group by country_long\").fetchall()'\r\n100 loops, best of 5: 2.38 msec per loop\r\n```\r\nI should try this against a much larger database.\r\n\r\nhttps://covid-19.datasettes.com/covid.db is 879MB.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097091527, "label": "Research how much of a difference analyze / sqlite_stat1 makes"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008219844", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/369", "id": 1008219844, "node_id": "IC_kwDOCGYnMM48GDbE", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T03:08:09Z", "updated_at": "2022-01-09T03:08:09Z", "author_association": "OWNER", "body": "```\r\nanalyze % sqlite-utils global-power-plants-analyzed.db 'analyze'\r\n[{\"rows_affected\": -1}]\r\nanalyze % sqlite-utils tables global-power-plants-analyzed.db \r\n[{\"table\": \"global-power-plants\"},\r\n {\"table\": \"global-power-plants_fts\"},\r\n {\"table\": \"global-power-plants_fts_data\"},\r\n {\"table\": \"global-power-plants_fts_idx\"},\r\n {\"table\": \"global-power-plants_fts_docsize\"},\r\n {\"table\": \"global-power-plants_fts_config\"},\r\n {\"table\": \"sqlite_stat1\"}]\r\nanalyze % sqlite-utils rows global-power-plants-analyzed.db sqlite_stat1 -t\r\ntbl idx stat\r\n------------------------------- ---------------------------------- ---------\r\nglobal-power-plants_fts_config global-power-plants_fts_config 1 1\r\nglobal-power-plants_fts_docsize 33643\r\nglobal-power-plants_fts_idx global-power-plants_fts_idx 199 40 1\r\nglobal-power-plants_fts_data 136\r\nglobal-power-plants \"global-power-plants_owner\" 33643 4\r\nglobal-power-plants \"global-power-plants_country_long\" 33643 202\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097091527, "label": "Research how much of a difference analyze / sqlite_stat1 makes"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008219588", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/369", "id": 1008219588, "node_id": "IC_kwDOCGYnMM48GDXE", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T03:06:42Z", "updated_at": "2022-01-09T03:06:42Z", "author_association": "OWNER", "body": "```\r\nanalyze % sqlite-utils indexes global-power-plants.db -t \r\ntable index_name seqno cid name desc coll key\r\n------------------------------ ------------------------------------------------- ------- ----- ------------ ------ ------ -----\r\nglobal-power-plants \"global-power-plants_owner\" 0 12 owner 0 BINARY 1\r\nglobal-power-plants \"global-power-plants_country_long\" 0 1 country_long 0 BINARY 1\r\nglobal-power-plants_fts_idx sqlite_autoindex_global-power-plants_fts_idx_1 0 0 segid 0 BINARY 1\r\nglobal-power-plants_fts_idx sqlite_autoindex_global-power-plants_fts_idx_1 1 1 term 0 BINARY 1\r\nglobal-power-plants_fts_config sqlite_autoindex_global-power-plants_fts_config_1 0 0 k 0 BINARY 1\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097091527, "label": "Research how much of a difference analyze / sqlite_stat1 makes"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008219484", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/369", "id": 1008219484, "node_id": "IC_kwDOCGYnMM48GDVc", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T03:05:44Z", "updated_at": "2022-01-09T03:05:44Z", "author_association": "OWNER", "body": "I'll start by running some experiments against the 11MB database file from https://global-power-plants.datasettes.com/global-power-plants.db", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097091527, "label": "Research how much of a difference analyze / sqlite_stat1 makes"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008219191", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/369", "id": 1008219191, "node_id": "IC_kwDOCGYnMM48GDQ3", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T03:03:53Z", "updated_at": "2022-01-09T03:03:53Z", "author_association": "OWNER", "body": "Refs:\r\n- #366\r\n- #365", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097091527, "label": "Research how much of a difference analyze / sqlite_stat1 makes"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008163585", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008163585, "node_id": "IC_kwDOCGYnMM48F1sB", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T22:14:39Z", "updated_at": "2022-01-09T03:03:07Z", "author_association": "OWNER", "body": "The reason I'm hesitating on this is that I've not actually used ANALYZE at all in nearly five years of messing around with SQLite! So I'm nervous that there are surprise downsides I haven't thought of.\r\n\r\nMy hunch is that ANALYZE is only worth worrying about on much larger databases, in which case I'm OK supporting it as a thoroughly documented power-user feature rather than a default.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/368#issuecomment-1008216371", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/368", "id": 1008216371, "node_id": "IC_kwDOCGYnMM48GCkz", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T02:36:22Z", "updated_at": "2022-01-09T02:36:22Z", "author_association": "OWNER", "body": "In Python 3.6: https://docs.python.org/3.6/library/subprocess.html\r\n\r\n> This does not capture stdout or stderr by default. To do so, pass [`PIPE`](https://docs.python.org/3.6/library/subprocess.html#subprocess.PIPE \"subprocess.PIPE\") for the *stdout* and/or *stderr* arguments.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097087280, "label": "Offer `python -m sqlite_utils` as an alternative to `sqlite-utils`"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/368#issuecomment-1008216271", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/368", "id": 1008216271, "node_id": "IC_kwDOCGYnMM48GCjP", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T02:35:09Z", "updated_at": "2022-01-09T02:35:09Z", "author_association": "OWNER", "body": "Test failure on Python 3.6:\r\n\r\n> `E TypeError: __init__() got an unexpected keyword argument 'capture_output'`", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097087280, "label": "Offer `python -m sqlite_utils` as an alternative to `sqlite-utils`"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/367#issuecomment-1008158799", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/367", "id": 1008158799, "node_id": "IC_kwDOCGYnMM48F0hP", "user": {"value": 22429695, "label": "codecov[bot]"}, "created_at": "2022-01-08T21:36:55Z", "updated_at": "2022-01-09T02:34:44Z", "author_association": "NONE", "body": "# [Codecov](https://codecov.io/gh/simonw/sqlite-utils/pull/367?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) Report\n> Merging [#367](https://codecov.io/gh/simonw/sqlite-utils/pull/367?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) (9848eaa) into [main](https://codecov.io/gh/simonw/sqlite-utils/commit/a8f9cc6f64f299830834428509940d448b82b4ed?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) (a8f9cc6) will **decrease** coverage by `0.20%`.\n> The diff coverage is `50.00%`.\n\n[![Impacted file tree graph](https://codecov.io/gh/simonw/sqlite-utils/pull/367/graphs/tree.svg?width=650&height=150&src=pr&token=O0X3703L9P&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison)](https://codecov.io/gh/simonw/sqlite-utils/pull/367?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison)\n\n```diff\n@@ Coverage Diff @@\n## main #367 +/- ##\n==========================================\n- Coverage 96.44% 96.24% -0.21% \n==========================================\n Files 5 6 +1 \n Lines 2307 2317 +10 \n==========================================\n+ Hits 2225 2230 +5 \n- Misses 82 87 +5 \n```\n\n\n| [Impacted Files](https://codecov.io/gh/simonw/sqlite-utils/pull/367?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) | Coverage \u0394 | |\n|---|---|---|\n| [sqlite\\_utils/db.py](https://codecov.io/gh/simonw/sqlite-utils/pull/367/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-c3FsaXRlX3V0aWxzL2RiLnB5) | `97.15% <28.57%> (-0.42%)` | :arrow_down: |\n| [sqlite\\_utils/\\_\\_main\\_\\_.py](https://codecov.io/gh/simonw/sqlite-utils/pull/367/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-c3FsaXRlX3V0aWxzL19fbWFpbl9fLnB5) | `100.00% <100.00%> (\u00f8)` | |\n\n------\n\n[Continue to review full report at Codecov](https://codecov.io/gh/simonw/sqlite-utils/pull/367?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison).\n> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison)\n> `\u0394 = absolute (impact)`, `\u00f8 = not affected`, `? = missing data`\n> Powered by [Codecov](https://codecov.io/gh/simonw/sqlite-utils/pull/367?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). Last update [a8f9cc6...9848eaa](https://codecov.io/gh/simonw/sqlite-utils/pull/367?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison).\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097041471, "label": "Initial prototype of .analyze() methods"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008216201", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008216201, "node_id": "IC_kwDOCGYnMM48GCiJ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T02:34:12Z", "updated_at": "2022-01-09T02:34:12Z", "author_association": "OWNER", "body": "I can now write tests that look like this: https://github.com/simonw/sqlite-utils/blob/539f5ccd90371fa87f946018f8b77d55929e06db/tests/test_cli.py#L2024-L2030\r\n\r\nWhich means I can write a test that exercises this bug.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/368#issuecomment-1008215912", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/368", "id": 1008215912, "node_id": "IC_kwDOCGYnMM48GCdo", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T02:30:59Z", "updated_at": "2022-01-09T02:30:59Z", "author_association": "OWNER", "body": "Even better, inspired by `rich`, support `python -m sqlite_utils`. https://github.com/Textualize/rich/blob/master/rich/__main__.py", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097087280, "label": "Offer `python -m sqlite_utils` as an alternative to `sqlite-utils`"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008214998", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008214998, "node_id": "IC_kwDOCGYnMM48GCPW", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T02:23:20Z", "updated_at": "2022-01-09T02:23:20Z", "author_association": "OWNER", "body": "Possible way of running the test: add this to `sqlite_utils/cli.py`:\r\n\r\n```python\r\nif __name__ == \"__main__\":\r\n cli()\r\n```\r\nNow the tool can be run using `python -m sqlite_utils.cli --help`\r\n\r\nThen in the test use `subprocess` to call `sys.executable` (the path to the current Python interpreter) and pass it `-m sqlite_utils.cli` to run the script!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008214406", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008214406, "node_id": "IC_kwDOCGYnMM48GCGG", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T02:18:21Z", "updated_at": "2022-01-09T02:18:21Z", "author_association": "OWNER", "body": "I'm having trouble figuring out the best way to write a unit test for this. Filed a relevant feature request for Click here:\r\n- https://github.com/pallets/click/issues/2171", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008166084", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008166084, "node_id": "IC_kwDOCGYnMM48F2TE", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-01-08T22:32:47Z", "updated_at": "2022-01-08T22:32:47Z", "author_association": "CONTRIBUTOR", "body": "or using \u201c pragma optimize\u201d", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008164786", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008164786, "node_id": "IC_kwDOCGYnMM48F1-y", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-01-08T22:24:19Z", "updated_at": "2022-01-08T22:24:19Z", "author_association": "CONTRIBUTOR", "body": "the out-of-date scenario you describe could be addressed by automatically adding an analyze to the insert or convert commands if they implicate an index", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008164116", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008164116, "node_id": "IC_kwDOCGYnMM48F10U", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-01-08T22:18:57Z", "updated_at": "2022-01-08T22:18:57Z", "author_association": "CONTRIBUTOR", "body": "the table with the query ran so bad was about 50k. \r\n\r\ni think the scenario should not be worse than no stats. \r\n\r\ni also did not know that sqlite was so different from postgres and needed an explicit analyze call.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008163050", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008163050, "node_id": "IC_kwDOCGYnMM48F1jq", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T22:10:51Z", "updated_at": "2022-01-08T22:10:51Z", "author_association": "OWNER", "body": "Is there a downside to having a `sqlite_stat1` table if it has wildly incorrect statistics in it?\r\n\r\nImagine the following sequence of events:\r\n\r\n- User imports a few records, creating the table, using `sqlite-utils insert`\r\n- User runs `sqlite-utils create-index ...` which also creates and populates the `sqlite_stat1` table\r\n- User runs `insert` again to populate several million new records\r\n\r\nThe user now has a database file with several million records and a statistics table that is wildly out of date, having been populated when they only had a few.\r\n\r\nWill this result in surprisingly bad query performance compared to it that statistics table did not exist at all?\r\n\r\nIf so, I lean much harder towards `ANALYZE` as a strictly opt-in optimization, maybe with the `--analyze` option added to `sqlite-utils insert` top to help users opt in to updating their statistics after running big inserts.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008161965", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008161965, "node_id": "IC_kwDOCGYnMM48F1St", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-01-08T22:02:56Z", "updated_at": "2022-01-08T22:02:56Z", "author_association": "CONTRIBUTOR", "body": "for options 2 and 3, i would worry about discoverablity. \r\n\r\nin other db\u2019s it is not necessary to explicitly call analyze for most indices. ie for postgres\r\n\r\n> The system regularly collects statistics on all of a table's columns. Newly-created non-expression indexes can immediately use these statistics to determine an index's usefulness.\r\n\r\ni suppose i would propose raising a warning if the stats table is created that explains what is going on and informs users about a \u2014no-analyze argument.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1008158616", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1008158616, "node_id": "IC_kwDOCGYnMM48F0eY", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T21:35:32Z", "updated_at": "2022-01-08T21:35:32Z", "author_association": "OWNER", "body": "Built a prototype in a branch, see #367.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008158357", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008158357, "node_id": "IC_kwDOCGYnMM48F0aV", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T21:33:07Z", "updated_at": "2022-01-08T21:33:07Z", "author_association": "OWNER", "body": "The one thing that worries me a little bit about doing this by default is that it adds a surprising new table to the database - it may be confusing to users if they run `create-index` and their database suddenly has a new `sqlite_stat1` table, see https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1008157132\r\n\r\nOptions here are:\r\n\r\n- Do it anyway. People can tolerate a surprise table appearing when they create an index.\r\n- Only run `ANALYZE` if the user says `sqlite-utils create-index ... --analyze`\r\n- Use the `--analyze` option, but also automatically run `ANALYZE` if they create an index and the database they are working with already has a `sqlite_stat1` table\r\n\r\nI'm currently leading towards that third option - @fgregg any thoughts?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1587#issuecomment-1008157998", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1587", "id": 1008157998, "node_id": "IC_kwDOBm6k_c48F0Uu", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T21:29:54Z", "updated_at": "2022-01-08T21:29:54Z", "author_association": "OWNER", "body": "Relevant code: https://github.com/simonw/datasette/blob/00a2895cd2dc42c63846216b36b2dc9f41170129/datasette/database.py#L339-L354", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097040427, "label": "Add `sqlite_stat1`(-4) tables to hidden table list"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1587#issuecomment-1008157908", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1587", "id": 1008157908, "node_id": "IC_kwDOBm6k_c48F0TU", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T21:29:06Z", "updated_at": "2022-01-08T21:29:06Z", "author_association": "OWNER", "body": "Depending on the SQLite version (and compile options) that ran `ANALYZE` these can be called:\r\n\r\n- `sqlite_stat1`\r\n- `sqlite_stat2`\r\n- `sqlite_stat3`\r\n- `sqlite_stat4`", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1097040427, "label": "Add `sqlite_stat1`(-4) tables to hidden table list"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1008157132", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1008157132, "node_id": "IC_kwDOCGYnMM48F0HM", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T21:23:08Z", "updated_at": "2022-01-08T21:25:05Z", "author_association": "OWNER", "body": "Running `ANALYZE` creates a new visible table called `sqlite_stat1`: https://www.sqlite.org/fileformat.html#the_sqlite_stat1_table\r\n\r\nThis should be added to the default list of hidden tables in Datasette.\r\n\r\nIt looks something like this:\r\n\r\n| tbl | idx | stat |\r\n|---------------------------------|------------------------------------|-----------|\r\n| _counts | sqlite_autoindex__counts_1 | 5 1 |\r\n| global-power-plants_fts_config | global-power-plants_fts_config | 1 1 |\r\n| global-power-plants_fts_docsize | | 33643 |\r\n| global-power-plants_fts_idx | global-power-plants_fts_idx | 199 40 1 |\r\n| global-power-plants_fts_data | | 136 |\r\n| global-power-plants | \"global-power-plants_owner\" | 33643 4 |\r\n| global-power-plants | \"global-power-plants_country_long\" | 33643 202 |\r\n\r\n> In each such row, the sqlite_stat.stat column will be a string consisting of a list of integers followed by zero or more arguments. The first integer in this list is the approximate number of rows in the index. (The number of rows in the index is the same as the number of rows in the table, except for partial indexes.) The second integer is the approximate number of rows in the index that have the same value in the first column of the index. The third integer is the number number of rows in the index that have the same value for the first two columns. The N-th integer (for N>1) is the estimated average number of rows in the index which have the same value for the first N-1 columns. For a K-column index, there will be K+1 integers in the stat column. If the index is unique, then the last integer will be 1. ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008155916", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008155916, "node_id": "IC_kwDOCGYnMM48Fz0M", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T21:16:46Z", "updated_at": "2022-01-08T21:16:46Z", "author_association": "OWNER", "body": "No, `chunks()` seems to work OK in the test I just added.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008154873", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008154873, "node_id": "IC_kwDOCGYnMM48Fzj5", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T21:11:55Z", "updated_at": "2022-01-08T21:11:55Z", "author_association": "OWNER", "body": "I'm suspicious that the `chunks()` utility function may not be working correctly:\r\n```pycon\r\nIn [10]: [list(d) for d in list(chunks('abc', 5))]\r\nOut[10]: [['a'], ['b'], ['c']]\r\n\r\nIn [11]: [list(d) for d in list(chunks('abcdefghi', 5))]\r\nOut[11]: [['a'], ['b'], ['c'], ['d'], ['e'], ['f'], ['g'], ['h'], ['i']]\r\n\r\nIn [12]: [list(d) for d in list(chunks('abcdefghi', 3))]\r\nOut[12]: [['a'], ['b'], ['c'], ['d'], ['e'], ['f'], ['g'], ['h'], ['i']]\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008153586", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008153586, "node_id": "IC_kwDOCGYnMM48FzPy", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T21:06:15Z", "updated_at": "2022-01-08T21:06:15Z", "author_association": "OWNER", "body": "I added a print statement after `for query, params in queries_and_params` and confirmed that something in the code is waiting until 16 records are available to be inserted and then executing the inserts, even with `--batch-size 1`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008151884", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008151884, "node_id": "IC_kwDOCGYnMM48Fy1M", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T20:59:21Z", "updated_at": "2022-01-08T20:59:21Z", "author_association": "OWNER", "body": "(That Heroku example doesn't record the timestamp, which limits its usefulness)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008143248", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008143248, "node_id": "IC_kwDOCGYnMM48FwuQ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T20:34:12Z", "updated_at": "2022-01-08T20:34:12Z", "author_association": "OWNER", "body": "Built that tool: https://github.com/simonw/stream-delay and https://pypi.org/project/stream-delay/", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008129841", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008129841, "node_id": "IC_kwDOCGYnMM48Ftcx", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T20:04:42Z", "updated_at": "2022-01-08T20:04:42Z", "author_association": "OWNER", "body": "It would be easier to test this if I had a utility for streaming out a file one line at a time.\r\n\r\nA few recipes for this in https://superuser.com/questions/526242/cat-file-to-terminal-at-particular-speed-of-lines-per-second - I'm going to build a quick `stream-delay` tool though.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/1574#issuecomment-1007844190", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1574", "id": 1007844190, "node_id": "IC_kwDOBm6k_c48Ente", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-01-08T00:42:12Z", "updated_at": "2022-01-08T00:42:12Z", "author_association": "CONTRIBUTOR", "body": "is there a reason to not always use the slim option?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1084193403, "label": "introduce new option for datasette package to use a slim base image"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007643254", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1007643254, "node_id": "IC_kwDOCGYnMM48D2p2", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-07T18:37:56Z", "updated_at": "2022-01-07T18:37:56Z", "author_association": "OWNER", "body": "Or I could leave off `--no-analyze` and tell people that if they want to add an index without running analyze they can execute the `CREATE INDEX` themselves.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007642831", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1007642831, "node_id": "IC_kwDOCGYnMM48D2jP", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-07T18:37:18Z", "updated_at": "2022-01-07T18:37:18Z", "author_association": "OWNER", "body": "After implementing #366 I can make it so `sqlite-utils create-index` automatically runs `db.analyze(index_name)` afterwards, maybe with a `--no-analyze` option in case anyone wants to opt out of that for specific performance reasons.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1007641634", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1007641634, "node_id": "IC_kwDOCGYnMM48D2Qi", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-07T18:35:35Z", "updated_at": "2022-01-07T18:35:35Z", "author_association": "OWNER", "body": "Since the existing CLI feature is this:\r\n\r\n $ sqlite-utils analyze-tables github.db tags\r\n\r\nI can add `sqlite-utils analyze` to reflect the Python library method.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1007639860", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1007639860, "node_id": "IC_kwDOCGYnMM48D100", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-07T18:32:59Z", "updated_at": "2022-01-07T18:33:07Z", "author_association": "OWNER", "body": "From the SQLite docs:\r\n\r\n> If no arguments are given, all attached databases are analyzed. If a schema name is given as the argument, then all tables and indices in that one database are analyzed. If the argument is a table name, then only that table and the indices associated with that table are analyzed. If the argument is an index name, then only that one index is analyzed.\r\n\r\nSo I think this becomes two methods:\r\n\r\n- `db.analyze()` calls analyze on the whole database\r\n- `db.analyze(name_of_table_or_index)` for a specific named table or index\r\n- `table.analyze()` is a shortcut for `db.analyze(table.name)`", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1007637963", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1007637963, "node_id": "IC_kwDOCGYnMM48D1XL", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-07T18:30:13Z", "updated_at": "2022-01-07T18:30:13Z", "author_association": "OWNER", "body": "Annoyingly I use the word \"analyze\" to mean something else in the CLI - for these features:\r\n\r\n- #207 \r\n- #320\r\n\r\nthere's only one method with a similar name in the Python library though and that's this one:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/6e46b9913411682f3a3ec66f4d58886c1db8654b/sqlite_utils/db.py#L2904-L2906", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007636709", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1007636709, "node_id": "IC_kwDOCGYnMM48D1Dl", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-01-07T18:28:33Z", "updated_at": "2022-01-07T18:29:43Z", "author_association": "CONTRIBUTOR", "body": "i added an index to one table with sqlite-utils, and then a query that used to take about 1 second started taking hundreds of seconds. \r\n\r\nrunning analyze got me back to sub second speed.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007634999", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1007634999, "node_id": "IC_kwDOCGYnMM48D0o3", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-07T18:26:22Z", "updated_at": "2022-01-07T18:26:22Z", "author_association": "OWNER", "body": "I've not used the `ANALYZE` feature in SQLite at all before. Should probably add Python library methods for it.\r\n\r\nAnnoyingly I use the word \"analyze\" to mean something else in the CLI - for these features:\r\n- #207 \r\n- #320", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007633376", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1007633376, "node_id": "IC_kwDOCGYnMM48D0Pg", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-07T18:24:07Z", "updated_at": "2022-01-07T18:24:07Z", "author_association": "OWNER", "body": "Relevant documentation: https://www.sqlite.org/lang_analyze.html", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/dogsheep-photos/pull/36#issuecomment-1006708046", "issue_url": "https://api.github.com/repos/dogsheep/dogsheep-photos/issues/36", "id": 1006708046, "node_id": "IC_kwDOD079W848ASVO", "user": {"value": 71983, "label": "scoates"}, "created_at": "2022-01-06T16:04:46Z", "updated_at": "2022-01-06T16:04:46Z", "author_association": "NONE", "body": "This one got me, today, too. \ud83d\udc4d", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 988493790, "label": "Correct naming of tool in readme"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/363#issuecomment-1006344080", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/363", "id": 1006344080, "node_id": "IC_kwDOCGYnMM47-5eQ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T07:32:05Z", "updated_at": "2022-01-06T07:32:05Z", "author_association": "OWNER", "body": "As part of this work I should add test coverage of this error message too: https://github.com/simonw/sqlite-utils/blob/413f8ed754e38d7b190de888c85fe8438336cb11/sqlite_utils/cli.py#L826", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094981339, "label": "Better error message if `--convert` code fails to return a dict"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/363#issuecomment-1006343303", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/363", "id": 1006343303, "node_id": "IC_kwDOCGYnMM47-5SH", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T07:30:20Z", "updated_at": "2022-01-06T07:30:20Z", "author_association": "OWNER", "body": "This check should run inside the `.insert_all()` method. It should raise a custom exception which the CLI code can then catch and turn into a click error.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094981339, "label": "Better error message if `--convert` code fails to return a dict"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/356#issuecomment-1006318443", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/356", "id": 1006318443, "node_id": "IC_kwDOCGYnMM47-zNr", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T06:30:13Z", "updated_at": "2022-01-06T06:30:13Z", "author_association": "OWNER", "body": "Documentation:\r\n\r\n- https://sqlite-utils.datasette.io/en/latest/cli.html#inserting-unstructured-data-with-lines-and-text\r\n- https://sqlite-utils.datasette.io/en/latest/cli.html#applying-conversions-while-inserting-data", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1077431957, "label": "`sqlite-utils insert --convert` option"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/356#issuecomment-1006318007", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/356", "id": 1006318007, "node_id": "IC_kwDOCGYnMM47-zG3", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T06:28:53Z", "updated_at": "2022-01-06T06:28:53Z", "author_association": "OWNER", "body": "Implemented in #361.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1077431957, "label": "`sqlite-utils insert --convert` option"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006219956", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/361", "id": 1006219956, "node_id": "IC_kwDOCGYnMM47-bK0", "user": {"value": 22429695, "label": "codecov[bot]"}, "created_at": "2022-01-06T01:51:54Z", "updated_at": "2022-01-06T06:22:25Z", "author_association": "NONE", "body": "# [Codecov](https://codecov.io/gh/simonw/sqlite-utils/pull/361?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) Report\n> Merging [#361](https://codecov.io/gh/simonw/sqlite-utils/pull/361?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) (b7f0b88) into [main](https://codecov.io/gh/simonw/sqlite-utils/commit/f3fd8613113d21d44238a6ec54b375f5aa72c4e0?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) (f3fd861) will **decrease** coverage by `0.05%`.\n> The diff coverage is `92.85%`.\n\n[![Impacted file tree graph](https://codecov.io/gh/simonw/sqlite-utils/pull/361/graphs/tree.svg?width=650&height=150&src=pr&token=O0X3703L9P&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison)](https://codecov.io/gh/simonw/sqlite-utils/pull/361?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison)\n\n```diff\n@@ Coverage Diff @@\n## main #361 +/- ##\n==========================================\n- Coverage 96.49% 96.44% -0.06% \n==========================================\n Files 5 5 \n Lines 2283 2306 +23 \n==========================================\n+ Hits 2203 2224 +21 \n- Misses 80 82 +2 \n```\n\n\n| [Impacted Files](https://codecov.io/gh/simonw/sqlite-utils/pull/361?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) | Coverage \u0394 | |\n|---|---|---|\n| [sqlite\\_utils/cli.py](https://codecov.io/gh/simonw/sqlite-utils/pull/361/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-c3FsaXRlX3V0aWxzL2NsaS5weQ==) | `95.49% <92.00%> (-0.11%)` | :arrow_down: |\n| [sqlite\\_utils/utils.py](https://codecov.io/gh/simonw/sqlite-utils/pull/361/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-c3FsaXRlX3V0aWxzL3V0aWxzLnB5) | `94.23% <100.00%> (\u00f8)` | |\n\n------\n\n[Continue to review full report at Codecov](https://codecov.io/gh/simonw/sqlite-utils/pull/361?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison).\n> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison)\n> `\u0394 = absolute (impact)`, `\u00f8 = not affected`, `? = missing data`\n> Powered by [Codecov](https://codecov.io/gh/simonw/sqlite-utils/pull/361?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). Last update [f3fd861...b7f0b88](https://codecov.io/gh/simonw/sqlite-utils/pull/361?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison).\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094890366, "label": "--lines and --text and --convert and --import"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006315145", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/361", "id": 1006315145, "node_id": "IC_kwDOCGYnMM47-yaJ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T06:20:51Z", "updated_at": "2022-01-06T06:20:51Z", "author_association": "OWNER", "body": "This is all documented. I'm going to rebase-merge it to keep the individual commits.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094890366, "label": "--lines and --text and --convert and --import"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006311742", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/361", "id": 1006311742, "node_id": "IC_kwDOCGYnMM47-xk-", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T06:12:19Z", "updated_at": "2022-01-06T06:12:19Z", "author_association": "OWNER", "body": "Got that working:\r\n```\r\n% echo 'This is cool' | sqlite-utils insert words.db words - --text --convert '({\"word\": w} for w in text.split())'\r\n% sqlite-utils dump words.db \r\nBEGIN TRANSACTION;\r\nCREATE TABLE [words] (\r\n [word] TEXT\r\n);\r\nINSERT INTO \"words\" VALUES('This');\r\nINSERT INTO \"words\" VALUES('is');\r\nINSERT INTO \"words\" VALUES('cool');\r\nCOMMIT;\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094890366, "label": "--lines and --text and --convert and --import"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006309834", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/361", "id": 1006309834, "node_id": "IC_kwDOCGYnMM47-xHK", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T06:08:01Z", "updated_at": "2022-01-06T06:08:01Z", "author_association": "OWNER", "body": "For `--text` the conversion function should be allowed to return an iterable instead of a dictionary, in which case it will be treated as the full list of records to be inserted.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094890366, "label": "--lines and --text and --convert and --import"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006301546", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/361", "id": 1006301546, "node_id": "IC_kwDOCGYnMM47-vFq", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T05:44:47Z", "updated_at": "2022-01-06T05:44:47Z", "author_association": "OWNER", "body": "Just need documentation for `--convert` now against the various different types of input.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094890366, "label": "--lines and --text and --convert and --import"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006300280", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/361", "id": 1006300280, "node_id": "IC_kwDOCGYnMM47-ux4", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T05:40:45Z", "updated_at": "2022-01-06T05:40:45Z", "author_association": "OWNER", "body": "I'm going to rename `--all` to `--text`:\r\n\r\n> - Use `--text` to write the entire input to a column called \"text\"\r\n\r\nTo avoid that clash with Python's `all()` function.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094890366, "label": "--lines and --text and --convert and --import"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006299778", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/361", "id": 1006299778, "node_id": "IC_kwDOCGYnMM47-uqC", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T05:39:10Z", "updated_at": "2022-01-06T05:39:10Z", "author_association": "OWNER", "body": "`all` is a bad variable name because it clashes with the Python `all()` built-in function.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094890366, "label": "--lines and --text and --convert and --import"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006295276", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/361", "id": 1006295276, "node_id": "IC_kwDOCGYnMM47-tjs", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T05:26:11Z", "updated_at": "2022-01-06T05:26:11Z", "author_association": "OWNER", "body": "Here's the traceback if your `--convert` function doesn't return a dict right now:\r\n```\r\n% sqlite-utils insert /tmp/all.db blah /tmp/log.log --convert 'all.upper()' --all \r\n\r\nTraceback (most recent call last):\r\n File \"/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/bin/sqlite-utils\", line 33, in \r\n sys.exit(load_entry_point('sqlite-utils', 'console_scripts', 'sqlite-utils')())\r\n File \"/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py\", line 1137, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py\", line 1062, in main\r\n rv = self.invoke(ctx)\r\n File \"/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py\", line 1668, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py\", line 1404, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py\", line 763, in invoke\r\n return __callback(*args, **kwargs)\r\n File \"/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/cli.py\", line 949, in insert\r\n insert_upsert_implementation(\r\n File \"/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/cli.py\", line 834, in insert_upsert_implementation\r\n db[table].insert_all(\r\n File \"/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/db.py\", line 2602, in insert_all\r\n first_record = next(records)\r\n File \"/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/db.py\", line 3044, in fix_square_braces\r\n for record in records:\r\n File \"/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/cli.py\", line 831, in \r\n docs = (decode_base64_values(doc) for doc in docs)\r\n File \"/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/utils.py\", line 86, in decode_base64_values\r\n to_fix = [\r\n File \"/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/utils.py\", line 89, in \r\n if isinstance(doc[k], dict)\r\nTypeError: string indices must be integers\r\n```\r\nI can live with that for the moment.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094890366, "label": "--lines and --text and --convert and --import"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006294777", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/361", "id": 1006294777, "node_id": "IC_kwDOCGYnMM47-tb5", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T05:24:54Z", "updated_at": "2022-01-06T05:24:54Z", "author_association": "OWNER", "body": "> I added a custom error message for if the user's `--convert` code doesn't return a dict.\r\n\r\nThat turned out to be a bad idea because it meant exhausting the iterator early for the check - before we got to the `.insert_all()` code that breaks the iterator up into chunks. I tried fixing that with `itertools.tee()` to run the generator twice but that's grossly memory-inefficient for large imports.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094890366, "label": "--lines and --text and --convert and --import"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006288444", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/361", "id": 1006288444, "node_id": "IC_kwDOCGYnMM47-r48", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T05:07:10Z", "updated_at": "2022-01-06T05:07:10Z", "author_association": "OWNER", "body": "And here's a demo of `--convert` used with `--all` - I added a custom error message for if the user's `--convert` code doesn't return a dict.\r\n\r\n```\r\n% sqlite-utils insert /tmp/all.db blah /tmp/log.log --convert 'all.upper()' --all \r\nError: Records returned by your --convert function must be dicts\r\n% sqlite-utils insert /tmp/all.db blah /tmp/log.log --convert '{\"all\": all.upper()}' --all\r\n% sqlite-utils dump /tmp/all.db \r\nBEGIN TRANSACTION;\r\nCREATE TABLE [blah] (\r\n [all] TEXT\r\n);\r\nINSERT INTO \"blah\" VALUES('INFO: 127.0.0.1:60581 - \"GET / HTTP/1.1\" 200 OK\r\nINFO: 127.0.0.1:60581 - \"GET /FOO/-/STATIC/APP.CSS?CEAD5A HTTP/1.1\" 200 OK\r\nINFO: 127.0.0.1:60581 - \"GET /FAVICON.ICO HTTP/1.1\" 200 OK\r\nINFO: 127.0.0.1:60581 - \"GET /FOO/TIDDLYWIKI HTTP/1.1\" 200 OK\r\nINFO: 127.0.0.1:60581 - \"GET /FOO/-/STATIC/APP.CSS?CEAD5A HTTP/1.1\" 200 OK\r\nINFO: 127.0.0.1:60584 - \"GET /FOO/-/STATIC/SQL-FORMATTER-2.3.3.MIN.JS HTTP/1.1\" 200 OK\r\nINFO: 127.0.0.1:60586 - \"GET /FOO/-/STATIC/CODEMIRROR-5.57.0.MIN.JS HTTP/1.1\" 200 OK\r\nINFO: 127.0.0.1:60585 - \"GET /FOO/-/STATIC/CODEMIRROR-5.57.0.MIN.CSS HTTP/1.1\" 200 OK\r\nINFO: 127.0.0.1:60588 - \"GET /FOO/-/STATIC/CODEMIRROR-5.57.0-SQL.MIN.JS HTTP/1.1\" 200 OK\r\nINFO: 127.0.0.1:60587 - \"GET /FOO/-/STATIC/CM-RESIZE-1.0.1.MIN.JS HTTP/1.1\" 200 OK\r\nINFO: 127.0.0.1:60586 - \"GET /FOO/TIDDLYWIKI/TIDDLERS HTTP/1.1\" 200 OK\r\nINFO: 127.0.0.1:60586 - \"GET /FOO/-/STATIC/APP.CSS?CEAD5A HTTP/1.1\" 200 OK\r\nINFO: 127.0.0.1:60584 - \"GET /FOO/-/STATIC/TABLE.JS HTTP/1.1\" 200 OK\r\n');\r\nCOMMIT;\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094890366, "label": "--lines and --text and --convert and --import"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006284673", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/361", "id": 1006284673, "node_id": "IC_kwDOCGYnMM47-q-B", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T04:55:52Z", "updated_at": "2022-01-06T04:55:52Z", "author_association": "OWNER", "body": "Test code that just worked for me:\r\n```\r\nsqlite-utils insert /tmp/blah.db blah /tmp/log.log --convert '\r\nbits = line.split()\r\nreturn dict([(\"b_{}\".format(i), bit) for i, bit in enumerate(bits)])' --lines\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094890366, "label": "--lines and --text and --convert and --import"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006232013", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/361", "id": 1006232013, "node_id": "IC_kwDOCGYnMM47-eHN", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T02:21:35Z", "updated_at": "2022-01-06T02:21:35Z", "author_association": "OWNER", "body": "I'm having second thoughts about this bit:\r\n\r\n> Your Python code will be passed a \"row\" variable representing the imported row, and can return a modified row.\r\n>\r\n> If you are using `--lines` your code will be passed a \"line\" variable, and for `--all` an \"all\" variable.\r\n\r\nThe code in question is this:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/500a35ad4d91c8a6232134ce9406efec11bedff8/sqlite_utils/utils.py#L296-L303\r\n\r\nDo I really want to add the complexity of supporting different variable names there? I think always using `value` might be better.\r\n\r\nExcept... `value` made sense for the existing `sqlite-utils convert` command where you are running a conversion function against the value for the column in the current row - is it confusing if applied to lines or documents or `all`?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094890366, "label": "--lines and --text and --convert and --import"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006230411", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/361", "id": 1006230411, "node_id": "IC_kwDOCGYnMM47-duL", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T02:17:35Z", "updated_at": "2022-01-06T02:17:35Z", "author_association": "OWNER", "body": "Documentation: https://github.com/simonw/sqlite-utils/blob/33223856ff7fe746b7b77750fbe5b218531d0545/docs/cli.rst#inserting-unstructured-data-with---lines-and---all - I went with a single section titled \"Inserting unstructured data with --lines and --all\"", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094890366, "label": "--lines and --text and --convert and --import"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006220129", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/361", "id": 1006220129, "node_id": "IC_kwDOCGYnMM47-bNh", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T01:52:26Z", "updated_at": "2022-01-06T01:52:26Z", "author_association": "OWNER", "body": "I'm going to refactor all of the tests for `sqlite-utils insert` into a new `test_cli_insert.py` module.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094890366, "label": "--lines and --text and --convert and --import"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/361#issuecomment-1006219848", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/361", "id": 1006219848, "node_id": "IC_kwDOCGYnMM47-bJI", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T01:51:36Z", "updated_at": "2022-01-06T01:51:36Z", "author_association": "OWNER", "body": "So far I've just implemented the new help:\r\n```\r\n% sqlite-utils insert --help\r\nUsage: sqlite-utils insert [OPTIONS] PATH TABLE FILE\r\n\r\n Insert records from FILE into a table, creating the table if it does not\r\n already exist.\r\n\r\n By default the input is expected to be a JSON array of objects. Or:\r\n\r\n - Use --nl for newline-delimited JSON objects\r\n - Use --csv or --tsv for comma-separated or tab-separated input\r\n - Use --lines to write each incoming line to a column called \"line\"\r\n - Use --all to write the entire input to a column called \"all\"\r\n\r\n You can also use --convert to pass a fragment of Python code that will be\r\n used to convert each input.\r\n\r\n Your Python code will be passed a \"row\" variable representing the imported\r\n row, and can return a modified row.\r\n\r\n If you are using --lines your code will be passed a \"line\" variable, and for\r\n --all an \"all\" variable.\r\n\r\nOptions:\r\n --pk TEXT Columns to use as the primary key, e.g. id\r\n --flatten Flatten nested JSON objects, so {\"a\": {\"b\": 1}}\r\n becomes {\"a_b\": 1}\r\n --nl Expect newline-delimited JSON\r\n -c, --csv Expect CSV input\r\n --tsv Expect TSV input\r\n --lines Treat each line as a single value called 'line'\r\n --all Treat input as a single value called 'all'\r\n --convert TEXT Python code to convert each item\r\n --import TEXT Python modules to import\r\n --delimiter TEXT Delimiter to use for CSV files\r\n --quotechar TEXT Quote character to use for CSV/TSV\r\n --sniff Detect delimiter and quote character\r\n --no-headers CSV file has no header row\r\n --batch-size INTEGER Commit every X records\r\n --alter Alter existing table to add any missing columns\r\n --not-null TEXT Columns that should be created as NOT NULL\r\n --default ... Default value that should be set for a column\r\n --encoding TEXT Character encoding for input, defaults to utf-8\r\n -d, --detect-types Detect types for columns in CSV/TSV data\r\n --load-extension TEXT SQLite extensions to load\r\n --silent Do not show progress bar\r\n --ignore Ignore records if pk already exists\r\n --replace Replace records if pk already exists\r\n --truncate Truncate table before inserting records, if table\r\n already exists\r\n -h, --help Show this message and exit.\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1094890366, "label": "--lines and --text and --convert and --import"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/356#issuecomment-997496626", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/356", "id": 997496626, "node_id": "IC_kwDOCGYnMM47dJcy", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-20T00:38:15Z", "updated_at": "2022-01-06T01:29:03Z", "author_association": "OWNER", "body": "The implementation of this gets a tiny bit complicated.\r\n\r\nIgnoring `--convert`, the `--lines` option can internally produce `{\"line\": ...}` records and the `--all` option can produce `{\"all\": ...}` records.\r\n\r\nBut... when `--convert` is used, what should the code run against?\r\n\r\nIt could run against those already-converted records but that's a little bit strange, since you'd have to do this:\r\n\r\n sqlite-utils insert blah.db blah myfile.txt --all --convert '{\"item\": s for s in value[\"all\"].split(\"-\")}'\r\n\r\nHaving to use `value[\"all\"]` there is unintuitive. It would be nicer to have a `all` variable to work against.\r\n\r\nBut then for `--lines` should the local variable be called `line`? And how best to summarize these different names for local variables in the inline help for the feature?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1077431957, "label": "`sqlite-utils insert --convert` option"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/360#issuecomment-1006211113", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/360", "id": 1006211113, "node_id": "IC_kwDOCGYnMM47-ZAp", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-06T01:27:53Z", "updated_at": "2022-01-06T01:27:53Z", "author_association": "OWNER", "body": "It looks like you were using `sqlite-utils memory` - that works by loading the entire file into an in-memory database, so 170GB is very likely to run out of RAM.\r\n\r\nThe line of code there exhibits another problem: it's reading the entire JSON file into a Python string, so it looks like it's going to run out of RAM even before it gets to the SQLite in-memory database section.\r\n\r\nTo handle a file of this size you'd need to write it to a SQLite database on-disk first. The `sqlite-utils insert` command can do this, and it should be able to \"stream\" records in from a file without loading the entire thing into memory - but only for JSON-NL and CSV/TSV formats, not for JSON arrays.\r\n\r\nThe code in question is here:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/f3fd8613113d21d44238a6ec54b375f5aa72c4e0/sqlite_utils/cli.py#L738-L773\r\n\r\nThat's using Python generators for the CSV/TSV/JSON-NL variants... but it's doing this for regular JSON which requires reading the entire thing into memory:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/f3fd8613113d21d44238a6ec54b375f5aa72c4e0/sqlite_utils/cli.py#L767\r\n\r\nIf you have the ability to control how your 170GB file is generated you may have more luck converting it to CSV or TSV or newline-delimited JSON, then using `sqlite-utils insert` to insert it into a database file.\r\n\r\nTo be honest though I've never tested this tooling with anything nearly that big, so it's possible you'll still run into problems. If you do I'd love to hear about them!\r\n\r\nI would be tempted to tackle this size of job by writing a custom Python script, either using the `sqlite_utils` Python library or even calling `sqlite3` directly.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1091819089, "label": "MemoryError"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1534#issuecomment-1005975080", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1534", "id": 1005975080, "node_id": "IC_kwDOBm6k_c479fYo", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-05T18:29:06Z", "updated_at": "2022-01-05T18:29:06Z", "author_association": "OWNER", "body": "A really big downside to this is that it turns out many CDNs - apparently including Cloudflare - don't support the Vary header at all!\r\n\r\nMore in this thread: https://twitter.com/simonw/status/1478470282931163137", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1065432388, "label": "Maybe return JSON from HTML pages if `Accept: application/json` is sent"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1585#issuecomment-1003575286", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1585", "id": 1003575286, "node_id": "IC_kwDOBm6k_c470Vf2", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-01T15:40:38Z", "updated_at": "2022-01-01T15:40:38Z", "author_association": "OWNER", "body": "API tutorial: https://firebase.google.com/docs/hosting/api-deploy", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1091838742, "label": "Fire base caching for `publish cloudrun`"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-1003437288", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8", "id": 1003437288, "node_id": "IC_kwDODFE5qs47zzzo", "user": {"value": 28565, "label": "maxhawkins"}, "created_at": "2021-12-31T19:06:20Z", "updated_at": "2021-12-31T19:06:20Z", "author_association": "NONE", "body": "> @maxhawkins how hard would it be to add an entry to the table that includes the HTML version of the email, if it exists? I just attempted your the PR branch on a very small mbox file, and it worked great. My use case is a research project and I need to access more than just the body plain text.\r\n\r\nShouldn't be hard. The easiest way is probably to remove the `if body.content_type == \"text/html\"` clause from [utils.py:254](https://github.com/dogsheep/google-takeout-to-sqlite/pull/8/commits/8e6d487b697ce2e8ad885acf613a157bfba84c59#diff-25ad9dd1ced1b8bfc37fda8444819c803232c08891e4af3d4064aa205d8174eaR254) and just return content directly without parsing.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 954546309, "label": "Add Gmail takeout mbox import (v2)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1583#issuecomment-1002825217", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1583", "id": 1002825217, "node_id": "IC_kwDOBm6k_c47xeYB", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2021-12-30T00:34:16Z", "updated_at": "2021-12-30T00:34:16Z", "author_association": "CONTRIBUTOR", "body": "if that is not desirable, it might be good to document that users might want to set up a lifecycle rule to automatically delete these build artifacts. something like https://stackoverflow.com/questions/59937542/can-i-delete-container-images-from-google-cloud-storage-artifacts-bucket", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1090810196, "label": "consider adding deletion step of cloudbuild artifacts to gcloud publish"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-1002735370", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8", "id": 1002735370, "node_id": "IC_kwDODFE5qs47xIcK", "user": {"value": 203343, "label": "Btibert3"}, "created_at": "2021-12-29T18:58:23Z", "updated_at": "2021-12-29T18:58:23Z", "author_association": "NONE", "body": "@maxhawkins how hard would it be to add an entry to the table that includes the HTML version of the email, if it exists? I just attempted your the PR branch on a very small mbox file, and it worked great. My use case is a research project and I need to access more than just the body plain text.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 954546309, "label": "Add Gmail takeout mbox import (v2)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1152#issuecomment-1001791592", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1152", "id": 1001791592, "node_id": "IC_kwDOBm6k_c47tiBo", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-27T23:04:31Z", "updated_at": "2021-12-27T23:04:31Z", "author_association": "OWNER", "body": "Another option: rethink permissions to always work in terms of where clauses users as part of a SQL query that returns the overall allowed set of databases or tables. This would require rethinking existing permissions but it might be worthwhile prior to 1.0.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 770598024, "label": "Efficiently calculate list of databases/tables a user can view"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/878#issuecomment-1001699559", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/878", "id": 1001699559, "node_id": "IC_kwDOBm6k_c47tLjn", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-27T18:53:04Z", "updated_at": "2021-12-27T18:53:04Z", "author_association": "OWNER", "body": "I'm going to see if I can come up with the simplest possible version of this pattern for the `/-/metadata` and `/-/metadata.json` page, then try it for the database query page, before tackling the much more complex table page.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 648435885, "label": "New pattern for views that return either JSON or HTML, available for plugins"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/twitter-to-sqlite/issues/62#issuecomment-1001222213", "issue_url": "https://api.github.com/repos/dogsheep/twitter-to-sqlite/issues/62", "id": 1001222213, "node_id": "IC_kwDODEm0Qs47rXBF", "user": {"value": 6764957, "label": "swyxio"}, "created_at": "2021-12-26T17:59:25Z", "updated_at": "2021-12-26T17:59:25Z", "author_association": "NONE", "body": "just confirmed that this error does not occur when i use my public main account. gets more interesting!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1088816961, "label": "KeyError: 'created_at' for private accounts?"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/228#issuecomment-1001115286", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/228", "id": 1001115286, "node_id": "IC_kwDOCGYnMM47q86W", "user": {"value": 1206106, "label": "agguser"}, "created_at": "2021-12-26T07:01:31Z", "updated_at": "2021-12-26T07:01:31Z", "author_association": "NONE", "body": "`--no-headers` does not work?\r\n```\r\n$ echo 'a,1\\nb,2' | sqlite-utils memory --no-headers -t - 'select * from stdin'\r\na 1 \r\n--- --- \r\nb 2 \r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 807437089, "label": "--no-headers option for CSV and TSV"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1576#issuecomment-1000935523", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1576", "id": 1000935523, "node_id": "IC_kwDOBm6k_c47qRBj", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-24T21:33:05Z", "updated_at": "2021-12-24T21:33:05Z", "author_association": "OWNER", "body": "Another option would be to attempt to import `contextvars` and, if the import fails (for Python 3.6) continue using the current mechanism - then let Python 3.6 users know in the documentation that under Python 3.6 they will miss out on nested traces.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087181951, "label": "Traces should include SQL executed by subtasks created with `asyncio.gather`"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1577#issuecomment-1000673444", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1577", "id": 1000673444, "node_id": "IC_kwDOBm6k_c47pRCk", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-24T06:08:58Z", "updated_at": "2021-12-24T06:08:58Z", "author_association": "OWNER", "body": "https://pypistats.org/packages/datasette shows a breakdown of downloads by Python version:\r\n\r\n\"image\"\r\n\r\nIt looks like on a recent day I had 4,071 downloads from Python 3.7... and just 2 downloads from Python 3.6!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087913724, "label": "Drop support for Python 3.6"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1534#issuecomment-1000535904", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1534", "id": 1000535904, "node_id": "IC_kwDOBm6k_c47ovdg", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T21:44:31Z", "updated_at": "2021-12-23T21:44:31Z", "author_association": "OWNER", "body": "A big downside to this is that I would need to use `Vary: Accept` for when Datasette is running behind a cache such as Cloudflare - would that greatly reduce overall cache efficiency due to subtle variations in the accept headers sent by common browsers?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1065432388, "label": "Maybe return JSON from HTML pages if `Accept: application/json` is sent"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1579#issuecomment-1000485719", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1579", "id": 1000485719, "node_id": "IC_kwDOBm6k_c47ojNX", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T19:19:45Z", "updated_at": "2021-12-23T19:19:45Z", "author_association": "OWNER", "body": "All of those removed `block=True` lines in 8c401ee0f054de2f568c3a8302c9223555146407 really help confirm to me that this was a good decision.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087931918, "label": "`.execute_write(... block=True)` should be the default behaviour"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1579#issuecomment-1000485505", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1579", "id": 1000485505, "node_id": "IC_kwDOBm6k_c47ojKB", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T19:19:13Z", "updated_at": "2021-12-23T19:19:13Z", "author_association": "OWNER", "body": "Updated docs for `execute_write_fn()`: https://github.com/simonw/datasette/blob/75153ea9b94d09ec3d61f7c6ebdf378e0c0c7a0b/docs/internals.rst#await-dbexecute_write_fnfn-blocktrue", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087931918, "label": "`.execute_write(... block=True)` should be the default behaviour"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1579#issuecomment-1000481686", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1579", "id": 1000481686, "node_id": "IC_kwDOBm6k_c47oiOW", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T19:09:23Z", "updated_at": "2021-12-23T19:09:23Z", "author_association": "OWNER", "body": "Re-opening this because I missed updating some of the docs, and I also need to update Datasette's own code to not use `block=True` in a bunch of places.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087931918, "label": "`.execute_write(... block=True)` should be the default behaviour"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1579#issuecomment-1000479737", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1579", "id": 1000479737, "node_id": "IC_kwDOBm6k_c47ohv5", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T19:04:23Z", "updated_at": "2021-12-23T19:04:23Z", "author_association": "OWNER", "body": "Updated documentation: https://github.com/simonw/datasette/blob/00a2895cd2dc42c63846216b36b2dc9f41170129/docs/internals.rst#await-dbexecute_writesql-paramsnone-blocktrue", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087931918, "label": "`.execute_write(... block=True)` should be the default behaviour"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1579#issuecomment-1000477813", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1579", "id": 1000477813, "node_id": "IC_kwDOBm6k_c47ohR1", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T18:59:41Z", "updated_at": "2021-12-23T18:59:41Z", "author_association": "OWNER", "body": "I'm going to go with `execute_write(..., block=False)` as the mechanism for fire-and-forget write queries.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087931918, "label": "`.execute_write(... block=True)` should be the default behaviour"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1579#issuecomment-1000477621", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1579", "id": 1000477621, "node_id": "IC_kwDOBm6k_c47ohO1", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T18:59:12Z", "updated_at": "2021-12-23T18:59:12Z", "author_association": "OWNER", "body": "The easiest way to change this would be to default to `block=True` such that you need to pass `block=False` to the APIs to have them do fire-and-forget.\r\n\r\nAn alternative would be to add new, separately named methods which do the fire-and-forget thing.\r\n\r\nIf I hadn't recently added `execute_write_script` and `execute_write_many` in #1570 I'd be more into this idea, but I don't want to end up with eight methods - `execute_write`, `execute_write_queue`, `execute_write_many`, `execute_write_many_queue`, `execute_write_script`, `execute_write_scrript_queue`, `execute_write_fn`, `execute_write_fn_queue`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087931918, "label": "`.execute_write(... block=True)` should be the default behaviour"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1579#issuecomment-1000476413", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1579", "id": 1000476413, "node_id": "IC_kwDOBm6k_c47og79", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T18:56:06Z", "updated_at": "2021-12-23T18:56:06Z", "author_association": "OWNER", "body": "This is technically a breaking change, but a GitHub code search at https://cs.github.com/?scopeName=All+repos&scope=&q=execute_write%20datasette%20-owner%3Asimonw shows only one repo not-owned-by-me using this, and they're using `block=True`: https://github.com/mfa/datasette-webhook-write/blob/e82440f372a2f2e3ed27d1bd34c9fa3a53b49b94/datasette_webhook_write/__init__.py#L88-L89", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087931918, "label": "`.execute_write(... block=True)` should be the default behaviour"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1578#issuecomment-1000471782", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1578", "id": 1000471782, "node_id": "IC_kwDOBm6k_c47ofzm", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T18:44:01Z", "updated_at": "2021-12-23T18:44:01Z", "author_association": "OWNER", "body": "The example nginx config on https://docs.datasette.io/en/stable/deploying.html#nginx-proxy-configuration is currently:\r\n\r\n```\r\ndaemon off;\r\n\r\nevents {\r\n worker_connections 1024;\r\n}\r\nhttp {\r\n server {\r\n listen 80;\r\n location /my-datasette {\r\n proxy_pass http://127.0.0.1:8009/my-datasette;\r\n proxy_set_header Host $host;\r\n }\r\n }\r\n}\r\n```\r\nThis looks to me like it might exhibit the bug. Need to confirm that and figure out an alternative.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087919372, "label": "Confirm if documented nginx proxy config works for row pages with escaped characters in their primary key"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1578#issuecomment-1000471371", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1578", "id": 1000471371, "node_id": "IC_kwDOBm6k_c47oftL", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T18:42:50Z", "updated_at": "2021-12-23T18:42:50Z", "author_association": "OWNER", "body": "Confirmed, that fixed the bug for me on my server.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087919372, "label": "Confirm if documented nginx proxy config works for row pages with escaped characters in their primary key"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1578#issuecomment-1000470652", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1578", "id": 1000470652, "node_id": "IC_kwDOBm6k_c47ofh8", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T18:40:46Z", "updated_at": "2021-12-23T18:40:46Z", "author_association": "OWNER", "body": "[This StackOverflow answer](https://serverfault.com/a/463932) suggests that the fix is to change this:\r\n\r\n proxy_pass http://127.0.0.1:8000/;\r\n\r\nTo this:\r\n\r\n proxy_pass http://127.0.0.1:8000;\r\n\r\nQuoting the nginx documentation: http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_pass\r\n\r\n> A request URI is passed to the server as follows:\r\n> \r\n> - If the `proxy_pass` directive is specified with a URI, then when a request is passed to the server, the part of a [normalized](http://nginx.org/en/docs/http/ngx_http_core_module.html#location) request URI matching the location is replaced by a URI specified in the directive:\r\n> \r\n> location /name/ {\r\n> proxy_pass http://127.0.0.1/remote/;\r\n> }\r\n> \r\n> - If `proxy_pass` is specified without a URI, the request URI is passed to the server in the same form as sent by a client when the original request is processed, or the full normalized request URI is passed when processing the changed URI:\r\n> \r\n> location /some/path/ {\r\n> proxy_pass http://127.0.0.1;\r\n> }", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087919372, "label": "Confirm if documented nginx proxy config works for row pages with escaped characters in their primary key"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1578#issuecomment-1000469107", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1578", "id": 1000469107, "node_id": "IC_kwDOBm6k_c47ofJz", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T18:36:38Z", "updated_at": "2021-12-23T18:36:38Z", "author_association": "OWNER", "body": "This problem doesn't occur on my `localhost` running Uvicorn directly - but I'm seeing it in my production environment that runs Datasette behind an nginx proxy:\r\n\r\n```\r\n location / {\r\n proxy_pass http://127.0.0.1:8000/;\r\n\tproxy_set_header Host $host;\r\n }\r\n```\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087919372, "label": "Confirm if documented nginx proxy config works for row pages with escaped characters in their primary key"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1577#issuecomment-1000462309", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1577", "id": 1000462309, "node_id": "IC_kwDOBm6k_c47odfl", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T18:20:46Z", "updated_at": "2021-12-23T18:20:46Z", "author_association": "OWNER", "body": "There are a lot of improvements to `asyncio` in 3.7: https://docs.python.org/3/whatsnew/3.7.html#whatsnew37-asyncio", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087913724, "label": "Drop support for Python 3.6"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1577#issuecomment-1000461900", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1577", "id": 1000461900, "node_id": "IC_kwDOBm6k_c47odZM", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T18:19:44Z", "updated_at": "2021-12-23T18:19:44Z", "author_association": "OWNER", "body": "The 3.7 feature I want to use today is [contextvars](https://docs.python.org/3/library/contextvars.html) - but I have a workaround for the moment, see https://github.com/simonw/datasette/issues/1576#issuecomment-999987418\r\n\r\nSo I'm going to hold off on dropping 3.6 for a little bit longer. I imagine I'll drop it before Datasette 1.0 though.\r\n\r\nLeaving this issue open to gather thoughts and feedback on this issue from Datasette users and potential users.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087913724, "label": "Drop support for Python 3.6"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1577#issuecomment-1000461275", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1577", "id": 1000461275, "node_id": "IC_kwDOBm6k_c47odPb", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T18:18:11Z", "updated_at": "2021-12-23T18:18:11Z", "author_association": "OWNER", "body": "From the Twitter thread, there are still a decent amount of LTS Linux releases out there that are stuck on pre-3.7 Python.\r\n\r\nThough many of those are 3.5 and Datasette dropped support for 3.5 in November 2019: cf7776d36fbacefa874cbd6e5fcdc9fff7661203", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087913724, "label": "Drop support for Python 3.6"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1576#issuecomment-999990414", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1576", "id": 999990414, "node_id": "IC_kwDOBm6k_c47mqSO", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T02:08:39Z", "updated_at": "2021-12-23T18:16:35Z", "author_association": "OWNER", "body": "It's tiny: I'm tempted to vendor it. https://github.com/Skyscanner/aiotask-context/blob/master/aiotask_context/__init__.py\r\n\r\nNo, I'll add it as a pinned dependency, which I can then drop when I drop 3.6 support.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087181951, "label": "Traces should include SQL executed by subtasks created with `asyncio.gather`"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1576#issuecomment-999987418", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1576", "id": 999987418, "node_id": "IC_kwDOBm6k_c47mpja", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-23T01:59:58Z", "updated_at": "2021-12-23T02:02:12Z", "author_association": "OWNER", "body": "Another option: https://github.com/Skyscanner/aiotask-context - looks like it might be better as it's been updated for Python 3.7 in this commit https://github.com/Skyscanner/aiotask-context/commit/67108c91d2abb445655cc2af446fdb52ca7890c4\r\n\r\nThe Skyscanner one doesn't attempt to wrap any existing factories, but that's OK for my purposes since I don't need to handle arbitrary `asyncio` code written by other people.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087181951, "label": "Traces should include SQL executed by subtasks created with `asyncio.gather`"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1576#issuecomment-999876666", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1576", "id": 999876666, "node_id": "IC_kwDOBm6k_c47mOg6", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-22T20:59:22Z", "updated_at": "2021-12-22T21:18:09Z", "author_association": "OWNER", "body": "This article is relevant: [Context information storage for asyncio](https://blog.sqreen.com/asyncio/) - in particular the section https://blog.sqreen.com/asyncio/#context-inheritance-between-tasks which describes exactly the problem I have and their solution, which involves this trickery:\r\n\r\n```python\r\ndef request_task_factory(loop, coro):\r\n child_task = asyncio.tasks.Task(coro, loop=loop)\r\n parent_task = asyncio.Task.current_task(loop=loop)\r\n current_request = getattr(parent_task, 'current_request', None)\r\n setattr(child_task, 'current_request', current_request)\r\n return child_task\r\n\r\nloop = asyncio.get_event_loop()\r\nloop.set_task_factory(request_task_factory)\r\n```\r\n\r\nThey released their solution as a library: https://pypi.org/project/aiocontext/ and https://github.com/sqreen/AioContext - but that company was acquired by Datadog back in April and doesn't seem to be actively maintaining their open source stuff any more: https://twitter.com/SqreenIO/status/1384906075506364417", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087181951, "label": "Traces should include SQL executed by subtasks created with `asyncio.gather`"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1576#issuecomment-999878907", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1576", "id": 999878907, "node_id": "IC_kwDOBm6k_c47mPD7", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-12-22T21:03:49Z", "updated_at": "2021-12-22T21:10:46Z", "author_association": "OWNER", "body": "`context_vars` can solve this but they were introduced in Python 3.7: https://www.python.org/dev/peps/pep-0567/\r\n\r\nPython 3.6 support ends in a few days time, and it looks like Glitch has updated to 3.7 now - so maybe I can get away with Datasette needing 3.7 these days?\r\n\r\nTweeted about that here: https://twitter.com/simonw/status/1473761478155010048", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1087181951, "label": "Traces should include SQL executed by subtasks created with `asyncio.gather`"}, "performed_via_github_app": null}