{"html_url": "https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779416619", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/147", "id": 779416619, "node_id": "MDEyOklzc3VlQ29tbWVudDc3OTQxNjYxOQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-15T19:40:57Z", "updated_at": "2021-02-15T21:27:55Z", "author_association": "OWNER", "body": "Tried this experiment (not proper binary search, it only searches downwards):\r\n```python\r\nimport sqlite3\r\n\r\ndb = sqlite3.connect(\":memory:\")\r\n\r\ndef tryit(n):\r\n sql = \"select 1 where 1 in ({})\".format(\", \".join(\"?\" for i in range(n)))\r\n db.execute(sql, [0 for i in range(n)])\r\n\r\n\r\ndef find_limit(min=0, max=5_000_000):\r\n value = max\r\n while True:\r\n print('Trying', value)\r\n try:\r\n tryit(value)\r\n return value\r\n except:\r\n value = value // 2\r\n```\r\nRunning `find_limit()` with those default parameters takes about 1.47s on my laptop:\r\n```\r\nIn [9]: %timeit find_limit()\r\nTrying 5000000\r\nTrying 2500000...\r\n1.47 s \u00b1 28 ms per loop (mean \u00b1 std. dev. of 7 runs, 1 loop each)\r\n```\r\nInterestingly the value it suggested was 156250 - suggesting that the macOS `sqlite3` binary with a 500,000 limit isn't the same as whatever my Python is using here.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688670158, "label": "SQLITE_MAX_VARS maybe hard-coded too low"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779448912", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/147", "id": 779448912, "node_id": "MDEyOklzc3VlQ29tbWVudDc3OTQ0ODkxMg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-15T21:09:50Z", "updated_at": "2021-02-15T21:09:50Z", "author_association": "OWNER", "body": "I fiddled around and replaced that line with `batch_size = SQLITE_MAX_VARS // num_columns` - which evaluated to `10416` for this particular file. That got me this:\r\n\r\n 40.71s user 1.81s system 98% cpu 43.081 total\r\n\r\n43s is definitely better than 56s, but it's still not as big as the ~26.5s to ~3.5s improvement described by @simonwiles at the top of this issue. I wonder what I'm missing here.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688670158, "label": "SQLITE_MAX_VARS maybe hard-coded too low"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779446652", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/147", "id": 779446652, "node_id": "MDEyOklzc3VlQ29tbWVudDc3OTQ0NjY1Mg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-15T21:04:19Z", "updated_at": "2021-02-15T21:04:19Z", "author_association": "OWNER", "body": "... but it looks like `batch_size` is hard-coded to 100, rather than `None` - which means it's not being calculated using that value:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/1f49f32814a942fa076cfe5f504d1621188097ed/sqlite_utils/db.py#L704\r\n\r\nAnd\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/1f49f32814a942fa076cfe5f504d1621188097ed/sqlite_utils/db.py#L1877", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688670158, "label": "SQLITE_MAX_VARS maybe hard-coded too low"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779445423", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/147", "id": 779445423, "node_id": "MDEyOklzc3VlQ29tbWVudDc3OTQ0NTQyMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-15T21:00:44Z", "updated_at": "2021-02-15T21:01:09Z", "author_association": "OWNER", "body": "I tried changing the hard-coded value from 999 to 156_250 and running `sqlite-utils insert` against a 500MB CSV file, with these results:\r\n```\r\n(sqlite-utils) sqlite-utils % time sqlite-utils insert slow-ethos.db ethos ../ethos-datasette/ethos.csv --no-headers\r\n [###################################-] 99% 00:00:00sqlite-utils insert slow-ethos.db ethos ../ethos-datasette/ethos.csv\r\n44.74s user 7.61s system 92% cpu 56.601 total\r\n# Increased the setting here\r\n(sqlite-utils) sqlite-utils % time sqlite-utils insert fast-ethos.db ethos ../ethos-datasette/ethos.csv --no-headers\r\n [###################################-] 99% 00:00:00sqlite-utils insert fast-ethos.db ethos ../ethos-datasette/ethos.csv\r\n39.40s user 5.15s system 96% cpu 46.320 total\r\n```\r\nNot as big a difference as I was expecting.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688670158, "label": "SQLITE_MAX_VARS maybe hard-coded too low"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779417723", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/147", "id": 779417723, "node_id": "MDEyOklzc3VlQ29tbWVudDc3OTQxNzcyMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-15T19:44:02Z", "updated_at": "2021-02-15T19:47:00Z", "author_association": "OWNER", "body": "`%timeit find_limit(max=1_000_000)` took 378ms on my laptop\r\n\r\n`%timeit find_limit(max=500_000)` took 197ms\r\n\r\n`%timeit find_limit(max=200_000)` reported 53ms per loop\r\n\r\n`%timeit find_limit(max=100_000)` reported 26.8ms per loop.\r\n\r\nAll of these are still slow enough that I'm not comfortable running this search for every time the library is imported. Allowing users to opt-in to this as a performance enhancement might be better.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688670158, "label": "SQLITE_MAX_VARS maybe hard-coded too low"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/147#issuecomment-779409770", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/147", "id": 779409770, "node_id": "MDEyOklzc3VlQ29tbWVudDc3OTQwOTc3MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-02-15T19:23:11Z", "updated_at": "2021-02-15T19:23:11Z", "author_association": "OWNER", "body": "On my Mac right now I'm seeing a limit of 500,000:\r\n```\r\n% sqlite3 -cmd \".limits variable_number\"\r\n variable_number 500000\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688670158, "label": "SQLITE_MAX_VARS maybe hard-coded too low"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/147#issuecomment-683528149", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/147", "id": 683528149, "node_id": "MDEyOklzc3VlQ29tbWVudDY4MzUyODE0OQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-08-31T03:17:26Z", "updated_at": "2020-08-31T03:17:26Z", "author_association": "OWNER", "body": "+1 to making this something that users can customize. An optional argument to the `Database` constructor would be a neat way to do this.\r\n\r\nI think there's a terrifying way that we could find this value... we could perform a binary search for it! Open up a memory connection and try running different bulk inserts against it and catch the exceptions - then adjust and try again.\r\n\r\nMy hunch is that we could perform just 2 or 3 probes (maybe against carefully selected values) to find the highest value that works. If this process took less than a few ms to run I'd be happy to do it automatically when the class is instantiated (and let users disable that automatic proving by setting a value using the constructor argument).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688670158, "label": "SQLITE_MAX_VARS maybe hard-coded too low"}, "performed_via_github_app": null}