github
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008129841 | https://api.github.com/repos/simonw/sqlite-utils/issues/364 | 1008129841 | IC_kwDOCGYnMM48Ftcx | 9599 | 2022-01-08T20:04:42Z | 2022-01-08T20:04:42Z | OWNER | It would be easier to test this if I had a utility for streaming out a file one line at a time. A few recipes for this in https://superuser.com/questions/526242/cat-file-to-terminal-at-particular-speed-of-lines-per-second - I'm going to build a quick `stream-delay` tool though. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1095570074 | |
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008143248 | https://api.github.com/repos/simonw/sqlite-utils/issues/364 | 1008143248 | IC_kwDOCGYnMM48FwuQ | 9599 | 2022-01-08T20:34:12Z | 2022-01-08T20:34:12Z | OWNER | Built that tool: https://github.com/simonw/stream-delay and https://pypi.org/project/stream-delay/ | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1095570074 | |
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008151884 | https://api.github.com/repos/simonw/sqlite-utils/issues/364 | 1008151884 | IC_kwDOCGYnMM48Fy1M | 9599 | 2022-01-08T20:59:21Z | 2022-01-08T20:59:21Z | OWNER | (That Heroku example doesn't record the timestamp, which limits its usefulness) | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1095570074 | |
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008153586 | https://api.github.com/repos/simonw/sqlite-utils/issues/364 | 1008153586 | IC_kwDOCGYnMM48FzPy | 9599 | 2022-01-08T21:06:15Z | 2022-01-08T21:06:15Z | OWNER | I added a print statement after `for query, params in queries_and_params` and confirmed that something in the code is waiting until 16 records are available to be inserted and then executing the inserts, even with `--batch-size 1`. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1095570074 | |
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008154873 | https://api.github.com/repos/simonw/sqlite-utils/issues/364 | 1008154873 | IC_kwDOCGYnMM48Fzj5 | 9599 | 2022-01-08T21:11:55Z | 2022-01-08T21:11:55Z | OWNER | I'm suspicious that the `chunks()` utility function may not be working correctly: ```pycon In [10]: [list(d) for d in list(chunks('abc', 5))] Out[10]: [['a'], ['b'], ['c']] In [11]: [list(d) for d in list(chunks('abcdefghi', 5))] Out[11]: [['a'], ['b'], ['c'], ['d'], ['e'], ['f'], ['g'], ['h'], ['i']] In [12]: [list(d) for d in list(chunks('abcdefghi', 3))] Out[12]: [['a'], ['b'], ['c'], ['d'], ['e'], ['f'], ['g'], ['h'], ['i']] ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1095570074 | |
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008155916 | https://api.github.com/repos/simonw/sqlite-utils/issues/364 | 1008155916 | IC_kwDOCGYnMM48Fz0M | 9599 | 2022-01-08T21:16:46Z | 2022-01-08T21:16:46Z | OWNER | No, `chunks()` seems to work OK in the test I just added. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1095570074 | |
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008214406 | https://api.github.com/repos/simonw/sqlite-utils/issues/364 | 1008214406 | IC_kwDOCGYnMM48GCGG | 9599 | 2022-01-09T02:18:21Z | 2022-01-09T02:18:21Z | OWNER | I'm having trouble figuring out the best way to write a unit test for this. Filed a relevant feature request for Click here: - https://github.com/pallets/click/issues/2171 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1095570074 | |
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008214998 | https://api.github.com/repos/simonw/sqlite-utils/issues/364 | 1008214998 | IC_kwDOCGYnMM48GCPW | 9599 | 2022-01-09T02:23:20Z | 2022-01-09T02:23:20Z | OWNER | Possible way of running the test: add this to `sqlite_utils/cli.py`: ```python if __name__ == "__main__": cli() ``` Now the tool can be run using `python -m sqlite_utils.cli --help` Then in the test use `subprocess` to call `sys.executable` (the path to the current Python interpreter) and pass it `-m sqlite_utils.cli` to run the script! | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1095570074 | |
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008216201 | https://api.github.com/repos/simonw/sqlite-utils/issues/364 | 1008216201 | IC_kwDOCGYnMM48GCiJ | 9599 | 2022-01-09T02:34:12Z | 2022-01-09T02:34:12Z | OWNER | I can now write tests that look like this: https://github.com/simonw/sqlite-utils/blob/539f5ccd90371fa87f946018f8b77d55929e06db/tests/test_cli.py#L2024-L2030 Which means I can write a test that exercises this bug. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1095570074 | |
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008233910 | https://api.github.com/repos/simonw/sqlite-utils/issues/364 | 1008233910 | IC_kwDOCGYnMM48GG22 | 9599 | 2022-01-09T05:32:53Z | 2022-01-09T05:35:45Z | OWNER | This is strange. The following: ```pycon >>> import subprocess >>> p = subprocess.Popen(["sqlite-utils", "insert", "/tmp/stream.db", "stream", "-", "--nl"], stdin=subprocess.PIPE) >>> p.stdin.write(b'\n'.join(b'{"id": %s}' % str(i).encode("utf-8") for i in range(1000))) 11889 >>> # At this point /tmp/stream.db is still 0 bytes - but if I then run this: >>> p.stdin.close() >>> # /tmp/stream.db is now 20K and contains the written data ``` No wait, mystery solved - I can add `p.stdin.flush()` instead of `p.stdin.close()` and the file suddenly jumps up in size. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1095570074 | |
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008234293 | https://api.github.com/repos/simonw/sqlite-utils/issues/364 | 1008234293 | IC_kwDOCGYnMM48GG81 | 9599 | 2022-01-09T05:37:02Z | 2022-01-09T05:37:02Z | OWNER | Calling `p.stdin.close()` and then `p.wait()` terminates the subprocess. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1095570074 | |
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008526736 | https://api.github.com/repos/simonw/sqlite-utils/issues/364 | 1008526736 | IC_kwDOCGYnMM48HOWQ | 9599 | 2022-01-10T04:07:29Z | 2022-01-10T04:07:29Z | OWNER | I think this test is right: ```python def test_insert_streaming_batch_size_1(db_path): # https://github.com/simonw/sqlite-utils/issues/364 # Streaming with --batch-size 1 should commit on each record # Can't use CliRunner().invoke() here bacuse we need to # run assertions in between writing to process stdin proc = subprocess.Popen( [ sys.executable, "-m", "sqlite_utils", "insert", db_path, "rows", "-", "--nl", "--batch-size", "1", ], stdin=subprocess.PIPE, ) proc.stdin.write(b'{"name": "Azi"}') proc.stdin.flush() assert list(Database(db_path)["rows"].rows) == [{"name": "Azi"}] proc.stdin.write(b'{"name": "Suna"}') proc.stdin.flush() assert list(Database(db_path)["rows"].rows) == [{"name": "Azi"}, {"name": "Suna"}] proc.stdin.close() proc.wait() ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1095570074 | |
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008537194 | https://api.github.com/repos/simonw/sqlite-utils/issues/364 | 1008537194 | IC_kwDOCGYnMM48HQ5q | 9599 | 2022-01-10T04:29:53Z | 2022-01-10T04:31:29Z | OWNER | After a bunch of debugging with `print()` statements it's clear that the problem isn't with when things are committed or the size of the batches - it's that the data sent to standard input is all being processed in one go, not a line at a time. I think that's because it is being buffered by this: https://github.com/simonw/sqlite-utils/blob/d2a79d200f9071a86027365fa2a576865b71064f/sqlite_utils/cli.py#L759-L770 The buffering is there so that we can sniff the first few bytes to detect if it's a CSV file - added in 99ff0a288c08ec2071139c6031eb880fa9c95310 for #230. So maybe for non-CSV inputs we should disable buffering? | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1095570074 | |
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008545140 | https://api.github.com/repos/simonw/sqlite-utils/issues/364 | 1008545140 | IC_kwDOCGYnMM48HS10 | 9599 | 2022-01-10T05:01:34Z | 2022-01-10T05:01:34Z | OWNER | Urgh, tests are still failing intermittently - for example: ``` time.sleep(0.4) > assert list(Database(db_path)["rows"].rows) == [{"name": "Azi"}] E AssertionError: assert [] == [{'name': 'Azi'}] E Right contains one more item: {'name': 'Azi'} E Full diff: E - [{'name': 'Azi'}] E + [] ``` I'm going to change this code to keep on trying up to 10 seconds - that should get the tests to pass faster on most machines. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1095570074 | |
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008546573 | https://api.github.com/repos/simonw/sqlite-utils/issues/364 | 1008546573 | IC_kwDOCGYnMM48HTMN | 9599 | 2022-01-10T05:05:15Z | 2022-01-10T05:05:15Z | OWNER | Bit nasty but it might work: ```python def try_until(expected): tries = 0 while True: rows = list(Database(db_path)["rows"].rows) if rows == expected: return tries += 1 if tries > 10: assert False, "Expected {}, got {}".format(expected, rows) time.sleep(tries * 0.1) try_until([{"name": "Azi"}]) proc.stdin.write(b'{"name": "Suna"}\n') proc.stdin.flush() try_until([{"name": "Azi"}, {"name": "Suna"}]) ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1095570074 | |
https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008557414 | https://api.github.com/repos/simonw/sqlite-utils/issues/364 | 1008557414 | IC_kwDOCGYnMM48HV1m | 9599 | 2022-01-10T05:36:19Z | 2022-01-10T05:36:19Z | OWNER | That did the trick. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
1095570074 |