{"html_url": "https://github.com/simonw/datasette/issues/607#issuecomment-546752311", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/607", "id": 546752311, "node_id": "MDEyOklzc3VlQ29tbWVudDU0Njc1MjMxMQ==", "user": {"value": 8431341, "label": "zeluspudding"}, "created_at": "2019-10-28T00:37:10Z", "updated_at": "2019-10-28T00:37:10Z", "author_association": "NONE", "body": "UPDATE:\r\nAccording to tips suggested in [Squeezing Performance from SQLite: Indexes? Indexes!](https://medium.com/@JasonWyatt/squeezing-performance-from-sqlite-indexes-indexes-c4e175f3c346) I have added an index to my large table and benchmarked query speeds in the case where I want to return `all rows`, `rows exactly equal to 'Musk Elon'` and, `rows like 'musk'`. Indexing reduced query time for each of those measures and **dramatically** reduced the time to return `rows exactly equal to 'Musk Elon'` as shown below:\r\n\r\n> table: edgar_idx\r\n> rows: 16,428,090 rows\r\n> **indexed: False**\r\n> Return all rows where company name exactly equal to Musk Elon\r\n> query: select rowid, * from edgar_idx where \"company\" = :p0 order by rowid limit 101\r\n> query time: Query took 21821.031ms\r\n> \r\n> Return all rows where company name contains Musk\r\n> query: select rowid, * from edgar_idx where \"company\" like :p0 order by rowid limit 101\r\n> query time: Query took 20505.029ms\r\n> \r\n> Return everything\r\n> query: select rowid, * from edgar_idx order by rowid limit 101\r\n> query time: Query took 7985.011ms\r\n> \r\n> **indexed: True**\r\n> Return all rows where company name exactly equal to Musk Elon\r\n> query: select rowid, * from edgar_idx where \"company\" = :p0 order by rowid limit 101\r\n> query time: Query took 30.0ms\r\n> \r\n> Return all rows where company name contains Musk\r\n> query: select rowid, * from edgar_idx where \"company\" like :p0 order by rowid limit 101\r\n> query time: Query took 13340.019ms\r\n> \r\n> Return everything\r\n> query: select rowid, * from edgar_idx order by rowid limit 101\r\n> query time: Query took 2190.003ms\r\n\r\nSo indexing reduced query time for an exact match to \"Musk Elon\" from almost `22 seconds` to `30.0ms`. **That's amazing and truly promising!** However, an autocomplete feature relies on fuzzy / incomplete matching, which is more similar to the `contains 'musk'` query... Unfortunately, that takes 13 seconds even after indexing. So the hunt for a fast fuzzy / autocomplete search capability persists.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 512996469, "label": "Ways to improve fuzzy search speed on larger data sets?"}, "performed_via_github_app": null}