issue_comments
6 rows where issue = 512996469 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Ways to improve fuzzy search speed on larger data sets? · 6 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
550649607 | https://github.com/simonw/datasette/issues/607#issuecomment-550649607 | https://api.github.com/repos/simonw/datasette/issues/607 | MDEyOklzc3VlQ29tbWVudDU1MDY0OTYwNw== | zeluspudding 8431341 | 2019-11-07T03:38:10Z | 2019-11-07T03:38:10Z | NONE | I just got FTS5 working and it is incredible! The lookup time for returning all rows where company name contains "Musk" from my table of 16,428,090 rows has dropped from So cool! Thanks again for the pointers and awesome datasette! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Ways to improve fuzzy search speed on larger data sets? 512996469 | |
548060038 | https://github.com/simonw/datasette/issues/607#issuecomment-548060038 | https://api.github.com/repos/simonw/datasette/issues/607 | MDEyOklzc3VlQ29tbWVudDU0ODA2MDAzOA== | zeluspudding 8431341 | 2019-10-30T18:47:57Z | 2019-10-30T18:47:57Z | NONE | Hi Simon, thanks for the pointer! Feeling good that I came to your conclusion a few days ago. I did hit a snag with figuring out how to compile a special version of sqlite for my windows machine (which I only realized I needed to do after running your command I'll try to solve that problem next week and report back here with my findings (if you know of a good tutorial for compiling on windows, I'm all ears). Either way, I'll try to close this issue out in the next two weeks. Thanks again! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Ways to improve fuzzy search speed on larger data sets? 512996469 | |
548055544 | https://github.com/simonw/datasette/issues/607#issuecomment-548055544 | https://api.github.com/repos/simonw/datasette/issues/607 | MDEyOklzc3VlQ29tbWVudDU0ODA1NTU0NA== | simonw 9599 | 2019-10-30T18:37:44Z | 2019-10-30T18:37:52Z | OWNER | .Hi @zeluspudding You're running your search queries using the "contains" filter, which uses a SQL Instead, you should take a look at SQLite's FTS - full text indexing feature. You can build a FTS index against a column and dramatically speed up searches for words within that column. This documentation should help get you started: https://datasette.readthedocs.io/en/stable/full_text_search.html |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Ways to improve fuzzy search speed on larger data sets? 512996469 | |
546752311 | https://github.com/simonw/datasette/issues/607#issuecomment-546752311 | https://api.github.com/repos/simonw/datasette/issues/607 | MDEyOklzc3VlQ29tbWVudDU0Njc1MjMxMQ== | zeluspudding 8431341 | 2019-10-28T00:37:10Z | 2019-10-28T00:37:10Z | NONE | UPDATE:
According to tips suggested in Squeezing Performance from SQLite: Indexes? Indexes! I have added an index to my large table and benchmarked query speeds in the case where I want to return
So indexing reduced query time for an exact match to "Musk Elon" from almost |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Ways to improve fuzzy search speed on larger data sets? 512996469 | |
546723302 | https://github.com/simonw/datasette/issues/607#issuecomment-546723302 | https://api.github.com/repos/simonw/datasette/issues/607 | MDEyOklzc3VlQ29tbWVudDU0NjcyMzMwMg== | zeluspudding 8431341 | 2019-10-27T18:59:55Z | 2019-10-27T19:00:48Z | NONE | Ultimately, I'm needing to serve searches like this to multiple users (at times concurrently). Given the size of the database I'm working with, can anyone comment as to whether I should be storing this in something like MySQL or Postgres rather than SQLite. I know there's been much defense of sqlite being performant but I wonder if those arguments break down as the database size increases. For example, if I scroll to the bottom of that linked page, where it says Checklist For Choosing The Right Database Engine, here's how I answer those questions:
So is sqlite still a good idea here? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Ways to improve fuzzy search speed on larger data sets? 512996469 | |
546722281 | https://github.com/simonw/datasette/issues/607#issuecomment-546722281 | https://api.github.com/repos/simonw/datasette/issues/607 | MDEyOklzc3VlQ29tbWVudDU0NjcyMjI4MQ== | zeluspudding 8431341 | 2019-10-27T18:46:29Z | 2019-10-27T19:00:40Z | NONE | Update: I've created a table of only unique names. This reduces the search space from over 16 million, to just about 640,000. Interestingly, it takes less than 2 seconds to create this table using Python. Performing the same search that we did earlier for Any ideas for slashing the search speed nearly 10 fold? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Ways to improve fuzzy search speed on larger data sets? 512996469 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [issue] INTEGER REFERENCES [issues]([id]) , [performed_via_github_app] TEXT); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 2