{"html_url": "https://github.com/simonw/sqlite-utils/issues/555#issuecomment-1592047502", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/555", "id": 1592047502, "node_id": "IC_kwDOCGYnMM5e5LeO", "user": {"value": 7908073, "label": "chapmanjacobd"}, "created_at": "2023-06-14T22:00:10Z", "updated_at": "2023-06-14T22:01:57Z", "author_association": "CONTRIBUTOR", "body": "You may want to try doing a performance comparison between this and just selecting all the ids with few constraints and then doing the filtering within python.\r\n\r\nThat might seem like a lazy-programmer, inefficient way but queries with large resultsets are a different profile than what databases like SQLITE are designed for. That is not to say that SQLITE is slow or that python is always faster but when you start reading >20% of an index there is an equilibrium that is reached. Especially when adding in writing extra temp tables and stuff to memory/disk. And especially given the `NOT IN` style of query...\r\n\r\nYou may also try chunking like this:\r\n\r\n```py\r\ndef chunks(lst, n) -> Generator:\r\n for i in range(0, len(lst), n):\r\n yield lst[i : i + n]\r\n\r\nSQLITE_PARAM_LIMIT = 32765\r\n\r\ndata = []\r\nchunked = chunks(video_ids, consts.SQLITE_PARAM_LIMIT)\r\nfor ids in chunked:\r\n data.expand(\r\n list(\r\n db.query(\r\n f\"\"\"SELECT * from videos\r\n WHERE id in (\"\"\"\r\n + \",\".join([\"?\"] * len(ids))\r\n + \")\",\r\n (*ids,),\r\n )\r\n )\r\n )\r\n```\r\n\r\nbut that actually won't work with your `NOT IN` requirements. You need to query the full resultset to check any row.\r\n\r\nSince you are doing stuff with files/videos in SQLITE you might be interested in my side project: https://github.com/chapmanjacobd/library", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1733198948, "label": "Filter table by a large bunch of ids"}, "performed_via_github_app": null}