html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008526736,https://api.github.com/repos/simonw/sqlite-utils/issues/364,1008526736,IC_kwDOCGYnMM48HOWQ,9599,2022-01-10T04:07:29Z,2022-01-10T04:07:29Z,OWNER,"I think this test is right:
```python
def test_insert_streaming_batch_size_1(db_path):
# https://github.com/simonw/sqlite-utils/issues/364
# Streaming with --batch-size 1 should commit on each record
# Can't use CliRunner().invoke() here bacuse we need to
# run assertions in between writing to process stdin
proc = subprocess.Popen(
[
sys.executable,
""-m"",
""sqlite_utils"",
""insert"",
db_path,
""rows"",
""-"",
""--nl"",
""--batch-size"",
""1"",
],
stdin=subprocess.PIPE,
)
proc.stdin.write(b'{""name"": ""Azi""}')
proc.stdin.flush()
assert list(Database(db_path)[""rows""].rows) == [{""name"": ""Azi""}]
proc.stdin.write(b'{""name"": ""Suna""}')
proc.stdin.flush()
assert list(Database(db_path)[""rows""].rows) == [{""name"": ""Azi""}, {""name"": ""Suna""}]
proc.stdin.close()
proc.wait()
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1095570074,
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008537194,https://api.github.com/repos/simonw/sqlite-utils/issues/364,1008537194,IC_kwDOCGYnMM48HQ5q,9599,2022-01-10T04:29:53Z,2022-01-10T04:31:29Z,OWNER,"After a bunch of debugging with `print()` statements it's clear that the problem isn't with when things are committed or the size of the batches - it's that the data sent to standard input is all being processed in one go, not a line at a time.
I think that's because it is being buffered by this: https://github.com/simonw/sqlite-utils/blob/d2a79d200f9071a86027365fa2a576865b71064f/sqlite_utils/cli.py#L759-L770
The buffering is there so that we can sniff the first few bytes to detect if it's a CSV file - added in 99ff0a288c08ec2071139c6031eb880fa9c95310 for #230. So maybe for non-CSV inputs we should disable buffering?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1095570074,
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008545140,https://api.github.com/repos/simonw/sqlite-utils/issues/364,1008545140,IC_kwDOCGYnMM48HS10,9599,2022-01-10T05:01:34Z,2022-01-10T05:01:34Z,OWNER,"Urgh, tests are still failing intermittently - for example:
```
time.sleep(0.4)
> assert list(Database(db_path)[""rows""].rows) == [{""name"": ""Azi""}]
E AssertionError: assert [] == [{'name': 'Azi'}]
E Right contains one more item: {'name': 'Azi'}
E Full diff:
E - [{'name': 'Azi'}]
E + []
```
I'm going to change this code to keep on trying up to 10 seconds - that should get the tests to pass faster on most machines.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1095570074,
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008546573,https://api.github.com/repos/simonw/sqlite-utils/issues/364,1008546573,IC_kwDOCGYnMM48HTMN,9599,2022-01-10T05:05:15Z,2022-01-10T05:05:15Z,OWNER,"Bit nasty but it might work:
```python
def try_until(expected):
tries = 0
while True:
rows = list(Database(db_path)[""rows""].rows)
if rows == expected:
return
tries += 1
if tries > 10:
assert False, ""Expected {}, got {}"".format(expected, rows)
time.sleep(tries * 0.1)
try_until([{""name"": ""Azi""}])
proc.stdin.write(b'{""name"": ""Suna""}\n')
proc.stdin.flush()
try_until([{""name"": ""Azi""}, {""name"": ""Suna""}])
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1095570074,
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008557414,https://api.github.com/repos/simonw/sqlite-utils/issues/364,1008557414,IC_kwDOCGYnMM48HV1m,9599,2022-01-10T05:36:19Z,2022-01-10T05:36:19Z,OWNER,That did the trick.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1095570074,
https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009273525,https://api.github.com/repos/simonw/sqlite-utils/issues/366,1009273525,IC_kwDOCGYnMM48KEq1,9599,2022-01-10T19:32:39Z,2022-01-10T19:32:39Z,OWNER,"I'm going to implement the Python library methods based on the prototype:
```diff
commit 650f97a08f29a688c530e5f6c9eedc9269ed7bdc
Author: Simon Willison
Date: Sat Jan 8 13:34:01 2022 -0800
Initial prototype of .analyze(), refs #366
diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py
index dfc4723..1348b4a 100644
--- a/sqlite_utils/db.py
+++ b/sqlite_utils/db.py
@@ -923,6 +923,13 @@ class Database:
""Run a SQLite ``VACUUM`` against the database.""
self.execute(""VACUUM;"")
+ def analyze(self, name=None):
+ ""Run ``ANALYZE`` against the entire database or a named table or index.""
+ sql = ""ANALYZE""
+ if name is not None:
+ sql += "" [{}]"".format(name)
+ self.execute(sql)
+
class Queryable:
def exists(self) -> bool:
@@ -2902,6 +2909,10 @@ class Table(Queryable):
)
return self
+ def analyze(self):
+ ""Run ANALYZE against this table""
+ self.db.analyze(self.name)
+
def analyze_column(
self, column: str, common_limit: int = 10, value_truncate=None, total_rows=None
) -> ""ColumnDetails"":
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096563265,
https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009285627,https://api.github.com/repos/simonw/sqlite-utils/issues/366,1009285627,IC_kwDOCGYnMM48KHn7,9599,2022-01-10T19:49:19Z,2022-01-10T19:51:25Z,OWNER,Documentation for those two new methods: https://sqlite-utils.datasette.io/en/latest/python-api.html#optimizing-index-usage-with-analyze,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096563265,
https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009286373,https://api.github.com/repos/simonw/sqlite-utils/issues/366,1009286373,IC_kwDOCGYnMM48KHzl,9599,2022-01-10T19:50:22Z,2022-01-10T19:50:22Z,OWNER,"With respect to #365, I'm now thinking that having the ability to say ""... and then run ANALYZE"" could be useful for a bunch of Python methods. For example:
```python
db[""dogs""].insert_all(list_of_dogs, analyze=True)
db[""dogs""].create_index([""name""], analyze=True)
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096563265,
https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009288898,https://api.github.com/repos/simonw/sqlite-utils/issues/366,1009288898,IC_kwDOCGYnMM48KIbC,9599,2022-01-10T19:54:04Z,2022-01-10T19:54:04Z,OWNER,"Having browsed the API reference I think the methods that would benefit from an `analyze=True` parameter are:
- `db.create_index`
- `table.insert_all`
- `table.upsert_all`
- `table.delete_where`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096563265,
https://github.com/simonw/sqlite-utils/issues/375#issuecomment-1008556706,https://api.github.com/repos/simonw/sqlite-utils/issues/375,1008556706,IC_kwDOCGYnMM48HVqi,9599,2022-01-10T05:33:41Z,2022-01-10T05:33:41Z,OWNER,"I tested the prototype like this:
sqlite-utils blah.db 'create table blah (id integer primary key, name text)'
echo 'id,name
1,Cleo
2,Chicken' > blah.csv
sqlite-utils bulk blah.db 'insert into blah (id, name) values (:id, :name)' blah.csv --csv
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1097251014,
https://github.com/simonw/sqlite-utils/pull/367#issuecomment-1009272446,https://api.github.com/repos/simonw/sqlite-utils/issues/367,1009272446,IC_kwDOCGYnMM48KEZ-,9599,2022-01-10T19:31:08Z,2022-01-10T19:31:08Z,OWNER,I'm going to implement this in a separate commit from this PR.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1097041471,