{"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008557414", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008557414, "node_id": "IC_kwDOCGYnMM48HV1m", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T05:36:19Z", "updated_at": "2022-01-10T05:36:19Z", "author_association": "OWNER", "body": "That did the trick.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008546573", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008546573, "node_id": "IC_kwDOCGYnMM48HTMN", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T05:05:15Z", "updated_at": "2022-01-10T05:05:15Z", "author_association": "OWNER", "body": "Bit nasty but it might work:\r\n```python\r\n def try_until(expected):\r\n tries = 0\r\n while True:\r\n rows = list(Database(db_path)[\"rows\"].rows)\r\n if rows == expected:\r\n return\r\n tries += 1\r\n if tries > 10:\r\n assert False, \"Expected {}, got {}\".format(expected, rows)\r\n time.sleep(tries * 0.1)\r\n\r\n try_until([{\"name\": \"Azi\"}])\r\n proc.stdin.write(b'{\"name\": \"Suna\"}\\n')\r\n proc.stdin.flush()\r\n try_until([{\"name\": \"Azi\"}, {\"name\": \"Suna\"}])\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008545140", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008545140, "node_id": "IC_kwDOCGYnMM48HS10", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T05:01:34Z", "updated_at": "2022-01-10T05:01:34Z", "author_association": "OWNER", "body": "Urgh, tests are still failing intermittently - for example:\r\n```\r\n time.sleep(0.4)\r\n> assert list(Database(db_path)[\"rows\"].rows) == [{\"name\": \"Azi\"}]\r\nE AssertionError: assert [] == [{'name': 'Azi'}]\r\nE Right contains one more item: {'name': 'Azi'}\r\nE Full diff:\r\nE - [{'name': 'Azi'}]\r\nE + []\r\n```\r\nI'm going to change this code to keep on trying up to 10 seconds - that should get the tests to pass faster on most machines.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008537194", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008537194, "node_id": "IC_kwDOCGYnMM48HQ5q", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T04:29:53Z", "updated_at": "2022-01-10T04:31:29Z", "author_association": "OWNER", "body": "After a bunch of debugging with `print()` statements it's clear that the problem isn't with when things are committed or the size of the batches - it's that the data sent to standard input is all being processed in one go, not a line at a time.\r\n\r\nI think that's because it is being buffered by this: https://github.com/simonw/sqlite-utils/blob/d2a79d200f9071a86027365fa2a576865b71064f/sqlite_utils/cli.py#L759-L770\r\n\r\nThe buffering is there so that we can sniff the first few bytes to detect if it's a CSV file - added in 99ff0a288c08ec2071139c6031eb880fa9c95310 for #230. So maybe for non-CSV inputs we should disable buffering?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008526736", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008526736, "node_id": "IC_kwDOCGYnMM48HOWQ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T04:07:29Z", "updated_at": "2022-01-10T04:07:29Z", "author_association": "OWNER", "body": "I think this test is right:\r\n```python\r\ndef test_insert_streaming_batch_size_1(db_path):\r\n # https://github.com/simonw/sqlite-utils/issues/364\r\n # Streaming with --batch-size 1 should commit on each record\r\n # Can't use CliRunner().invoke() here bacuse we need to\r\n # run assertions in between writing to process stdin\r\n proc = subprocess.Popen(\r\n [\r\n sys.executable,\r\n \"-m\",\r\n \"sqlite_utils\",\r\n \"insert\",\r\n db_path,\r\n \"rows\",\r\n \"-\",\r\n \"--nl\",\r\n \"--batch-size\",\r\n \"1\",\r\n ],\r\n stdin=subprocess.PIPE,\r\n )\r\n proc.stdin.write(b'{\"name\": \"Azi\"}')\r\n proc.stdin.flush()\r\n assert list(Database(db_path)[\"rows\"].rows) == [{\"name\": \"Azi\"}]\r\n proc.stdin.write(b'{\"name\": \"Suna\"}')\r\n proc.stdin.flush()\r\n assert list(Database(db_path)[\"rows\"].rows) == [{\"name\": \"Azi\"}, {\"name\": \"Suna\"}]\r\n proc.stdin.close()\r\n proc.wait()\r\n```\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008234293", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008234293, "node_id": "IC_kwDOCGYnMM48GG81", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T05:37:02Z", "updated_at": "2022-01-09T05:37:02Z", "author_association": "OWNER", "body": "Calling `p.stdin.close()` and then `p.wait()` terminates the subprocess.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008233910", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008233910, "node_id": "IC_kwDOCGYnMM48GG22", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T05:32:53Z", "updated_at": "2022-01-09T05:35:45Z", "author_association": "OWNER", "body": "This is strange. The following:\r\n```pycon\r\n>>> import subprocess\r\n>>> p = subprocess.Popen([\"sqlite-utils\", \"insert\", \"/tmp/stream.db\", \"stream\", \"-\", \"--nl\"], stdin=subprocess.PIPE)\r\n>>> p.stdin.write(b'\\n'.join(b'{\"id\": %s}' % str(i).encode(\"utf-8\") for i in range(1000)))\r\n11889\r\n>>> # At this point /tmp/stream.db is still 0 bytes - but if I then run this:\r\n>>> p.stdin.close()\r\n>>> # /tmp/stream.db is now 20K and contains the written data\r\n```\r\nNo wait, mystery solved - I can add `p.stdin.flush()` instead of `p.stdin.close()` and the file suddenly jumps up in size.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008216201", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008216201, "node_id": "IC_kwDOCGYnMM48GCiJ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T02:34:12Z", "updated_at": "2022-01-09T02:34:12Z", "author_association": "OWNER", "body": "I can now write tests that look like this: https://github.com/simonw/sqlite-utils/blob/539f5ccd90371fa87f946018f8b77d55929e06db/tests/test_cli.py#L2024-L2030\r\n\r\nWhich means I can write a test that exercises this bug.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008214998", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008214998, "node_id": "IC_kwDOCGYnMM48GCPW", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T02:23:20Z", "updated_at": "2022-01-09T02:23:20Z", "author_association": "OWNER", "body": "Possible way of running the test: add this to `sqlite_utils/cli.py`:\r\n\r\n```python\r\nif __name__ == \"__main__\":\r\n cli()\r\n```\r\nNow the tool can be run using `python -m sqlite_utils.cli --help`\r\n\r\nThen in the test use `subprocess` to call `sys.executable` (the path to the current Python interpreter) and pass it `-m sqlite_utils.cli` to run the script!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008214406", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008214406, "node_id": "IC_kwDOCGYnMM48GCGG", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T02:18:21Z", "updated_at": "2022-01-09T02:18:21Z", "author_association": "OWNER", "body": "I'm having trouble figuring out the best way to write a unit test for this. Filed a relevant feature request for Click here:\r\n- https://github.com/pallets/click/issues/2171", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008155916", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008155916, "node_id": "IC_kwDOCGYnMM48Fz0M", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T21:16:46Z", "updated_at": "2022-01-08T21:16:46Z", "author_association": "OWNER", "body": "No, `chunks()` seems to work OK in the test I just added.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008154873", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008154873, "node_id": "IC_kwDOCGYnMM48Fzj5", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T21:11:55Z", "updated_at": "2022-01-08T21:11:55Z", "author_association": "OWNER", "body": "I'm suspicious that the `chunks()` utility function may not be working correctly:\r\n```pycon\r\nIn [10]: [list(d) for d in list(chunks('abc', 5))]\r\nOut[10]: [['a'], ['b'], ['c']]\r\n\r\nIn [11]: [list(d) for d in list(chunks('abcdefghi', 5))]\r\nOut[11]: [['a'], ['b'], ['c'], ['d'], ['e'], ['f'], ['g'], ['h'], ['i']]\r\n\r\nIn [12]: [list(d) for d in list(chunks('abcdefghi', 3))]\r\nOut[12]: [['a'], ['b'], ['c'], ['d'], ['e'], ['f'], ['g'], ['h'], ['i']]\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008153586", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008153586, "node_id": "IC_kwDOCGYnMM48FzPy", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T21:06:15Z", "updated_at": "2022-01-08T21:06:15Z", "author_association": "OWNER", "body": "I added a print statement after `for query, params in queries_and_params` and confirmed that something in the code is waiting until 16 records are available to be inserted and then executing the inserts, even with `--batch-size 1`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008151884", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008151884, "node_id": "IC_kwDOCGYnMM48Fy1M", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T20:59:21Z", "updated_at": "2022-01-08T20:59:21Z", "author_association": "OWNER", "body": "(That Heroku example doesn't record the timestamp, which limits its usefulness)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008143248", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008143248, "node_id": "IC_kwDOCGYnMM48FwuQ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T20:34:12Z", "updated_at": "2022-01-08T20:34:12Z", "author_association": "OWNER", "body": "Built that tool: https://github.com/simonw/stream-delay and https://pypi.org/project/stream-delay/", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008129841", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/364", "id": 1008129841, "node_id": "IC_kwDOCGYnMM48Ftcx", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T20:04:42Z", "updated_at": "2022-01-08T20:04:42Z", "author_association": "OWNER", "body": "It would be easier to test this if I had a utility for streaming out a file one line at a time.\r\n\r\nA few recipes for this in https://superuser.com/questions/526242/cat-file-to-terminal-at-particular-speed-of-lines-per-second - I'm going to build a quick `stream-delay` tool though.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1095570074, "label": "`--batch-size 1` doesn't seem to commit for every item"}, "performed_via_github_app": null}