{"html_url": "https://github.com/simonw/datasette/issues/699#issuecomment-626991001", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/699", "id": 626991001, "node_id": "MDEyOklzc3VlQ29tbWVudDYyNjk5MTAwMQ==", "user": {"value": 8431341, "label": "zeluspudding"}, "created_at": "2020-05-11T22:06:34Z", "updated_at": "2020-05-11T22:06:34Z", "author_association": "NONE", "body": "Very nice! Thank you for sharing that :+1: :) Will try it out!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 582526961, "label": "Authentication (and permissions) as a core concept"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/699#issuecomment-626807487", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/699", "id": 626807487, "node_id": "MDEyOklzc3VlQ29tbWVudDYyNjgwNzQ4Nw==", "user": {"value": 8431341, "label": "zeluspudding"}, "created_at": "2020-05-11T16:23:57Z", "updated_at": "2020-05-11T16:24:59Z", "author_association": "NONE", "body": "`Authorization: bearer xxx` auth for API keys is a plus plus for me. Looked into just adding this into your `Flask` logic but learned this project doesn't use flask. Interesting \ud83e\udd14", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 582526961, "label": "Authentication (and permissions) as a core concept"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/607#issuecomment-550649607", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/607", "id": 550649607, "node_id": "MDEyOklzc3VlQ29tbWVudDU1MDY0OTYwNw==", "user": {"value": 8431341, "label": "zeluspudding"}, "created_at": "2019-11-07T03:38:10Z", "updated_at": "2019-11-07T03:38:10Z", "author_association": "NONE", "body": "I just got FTS5 working and it is incredible! The lookup time for returning all rows where company name contains \"Musk\" from my table of 16,428,090 rows has dropped from `13,340.019` ms to `15.6`ms. Well below the 100ms latency for the \"real time autocomplete\" feel (which doesn't currently include the http call).\r\n\r\nSo cool! Thanks again for the pointers and awesome datasette!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 512996469, "label": "Ways to improve fuzzy search speed on larger data sets?"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/607#issuecomment-548060038", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/607", "id": 548060038, "node_id": "MDEyOklzc3VlQ29tbWVudDU0ODA2MDAzOA==", "user": {"value": 8431341, "label": "zeluspudding"}, "created_at": "2019-10-30T18:47:57Z", "updated_at": "2019-10-30T18:47:57Z", "author_association": "NONE", "body": "Hi Simon, thanks for the pointer! Feeling good that I came to your conclusion a few days ago. I did hit a snag with figuring out how to compile a special version of sqlite for my windows machine (which I only realized I needed to do after running your command `sqlite-utils enable-fts mydatabase.db items name description`). \r\n\r\nI'll try to solve that problem next week and report back here with my findings (if you know of a good tutorial for compiling on windows, I'm all ears). Either way, I'll try to close this issue out in the next two weeks. Thanks again!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 512996469, "label": "Ways to improve fuzzy search speed on larger data sets?"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/607#issuecomment-546752311", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/607", "id": 546752311, "node_id": "MDEyOklzc3VlQ29tbWVudDU0Njc1MjMxMQ==", "user": {"value": 8431341, "label": "zeluspudding"}, "created_at": "2019-10-28T00:37:10Z", "updated_at": "2019-10-28T00:37:10Z", "author_association": "NONE", "body": "UPDATE:\r\nAccording to tips suggested in [Squeezing Performance from SQLite: Indexes? Indexes!](https://medium.com/@JasonWyatt/squeezing-performance-from-sqlite-indexes-indexes-c4e175f3c346) I have added an index to my large table and benchmarked query speeds in the case where I want to return `all rows`, `rows exactly equal to 'Musk Elon'` and, `rows like 'musk'`. Indexing reduced query time for each of those measures and **dramatically** reduced the time to return `rows exactly equal to 'Musk Elon'` as shown below:\r\n\r\n> table: edgar_idx\r\n> rows: 16,428,090 rows\r\n> **indexed: False**\r\n> Return all rows where company name exactly equal to Musk Elon\r\n> query: select rowid, * from edgar_idx where \"company\" = :p0 order by rowid limit 101\r\n> query time: Query took 21821.031ms\r\n> \r\n> Return all rows where company name contains Musk\r\n> query: select rowid, * from edgar_idx where \"company\" like :p0 order by rowid limit 101\r\n> query time: Query took 20505.029ms\r\n> \r\n> Return everything\r\n> query: select rowid, * from edgar_idx order by rowid limit 101\r\n> query time: Query took 7985.011ms\r\n> \r\n> **indexed: True**\r\n> Return all rows where company name exactly equal to Musk Elon\r\n> query: select rowid, * from edgar_idx where \"company\" = :p0 order by rowid limit 101\r\n> query time: Query took 30.0ms\r\n> \r\n> Return all rows where company name contains Musk\r\n> query: select rowid, * from edgar_idx where \"company\" like :p0 order by rowid limit 101\r\n> query time: Query took 13340.019ms\r\n> \r\n> Return everything\r\n> query: select rowid, * from edgar_idx order by rowid limit 101\r\n> query time: Query took 2190.003ms\r\n\r\nSo indexing reduced query time for an exact match to \"Musk Elon\" from almost `22 seconds` to `30.0ms`. **That's amazing and truly promising!** However, an autocomplete feature relies on fuzzy / incomplete matching, which is more similar to the `contains 'musk'` query... Unfortunately, that takes 13 seconds even after indexing. So the hunt for a fast fuzzy / autocomplete search capability persists.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 512996469, "label": "Ways to improve fuzzy search speed on larger data sets?"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/607#issuecomment-546723302", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/607", "id": 546723302, "node_id": "MDEyOklzc3VlQ29tbWVudDU0NjcyMzMwMg==", "user": {"value": 8431341, "label": "zeluspudding"}, "created_at": "2019-10-27T18:59:55Z", "updated_at": "2019-10-27T19:00:48Z", "author_association": "NONE", "body": "Ultimately, I'm needing to serve searches like this to multiple users (at times concurrently). Given the size of the database I'm working with, can anyone comment as to whether I should be storing this in something like MySQL or Postgres rather than SQLite. I know there's been much [defense of sqlite being performant](https://www.sqlite.org/whentouse.html) but I wonder if those arguments break down as the database size increases.\r\n\r\nFor example, if I scroll to the bottom of that linked page, where it says **Checklist For Choosing The Right Database Engine**, here's how I answer those questions:\r\n\r\n - Is the data separated from the application by a network? \u2192 choose client/server\r\n __Yes__\r\n- Many concurrent writers? \u2192 choose client/server\r\n __Not exactly. I may have many concurrent readers but almost no concurrent writers.__\r\n- Big data? \u2192 choose client/server\r\n __No, my database is less than 40 gb and wont approach a terabyte in the next decade.__\r\n\r\nSo is sqlite still a good idea here?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 512996469, "label": "Ways to improve fuzzy search speed on larger data sets?"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/607#issuecomment-546722281", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/607", "id": 546722281, "node_id": "MDEyOklzc3VlQ29tbWVudDU0NjcyMjI4MQ==", "user": {"value": 8431341, "label": "zeluspudding"}, "created_at": "2019-10-27T18:46:29Z", "updated_at": "2019-10-27T19:00:40Z", "author_association": "NONE", "body": "Update: I've created a table of only unique names. This reduces the search space from over 16 million, to just about 640,000. Interestingly, it takes less than 2 seconds to create this table using Python. Performing the same search that we did earlier for `elon musk` takes nearly a second - much faster than before but still not speedy enough for an autocomplete feature (which usually needs to return results within 100ms to feel \"real time\"). \r\n\r\nAny ideas for slashing the search speed nearly 10 fold?\r\n\r\n> ![image](https://user-images.githubusercontent.com/8431341/67639587-b6c02b00-f8bf-11e9-9344-1d8667cad395.png)\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 512996469, "label": "Ways to improve fuzzy search speed on larger data sets?"}, "performed_via_github_app": null}