home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 512996469

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app reactions draft state_reason
512996469 MDU6SXNzdWU1MTI5OTY0Njk= 607 Ways to improve fuzzy search speed on larger data sets? 8431341 closed 0     6 2019-10-27T17:31:37Z 2019-11-07T03:38:10Z 2019-11-07T03:38:10Z NONE  

I have an sqlite table with 16 million rows in it. Having read @simonw article "Fast Autocomplete Search for Your Website" I was curious to try datasette to see what kind of query performance I could get out of it. In truth I don't need to do full text search since all I would like to do is give my users a way to search for the names of investors such as "Warren Buffet", or "Tim Cook" (who's names are in a single column).

On the first search, Datasette takes over 20 seconds to return all records associated with elon musk:

If I rerun the same search, it then takes almost 9 seconds:

That's far to slow to implement an autocomplete feature. I could reduce the latency by making a special table of only unique investor names, thereby reducing the search space to less than a million rows (then I'd need to implement a way to add only new investor names to the table as I received new data.. about 4,000 rows a day). If I did that, I'm still concerned the new table wouldn't be lean enough to lookup investor names quickly. Plus, even if I can implement the autocomplete feature, I would still finally have to lookup records for that investors which would take between 8 - 20 seconds.

Are there any tricks for speeding this up?

Here's my hardware:

107914493 issue    
{
    "url": "https://api.github.com/repos/simonw/datasette/issues/607/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 6 rows from issue in issue_comments
Powered by Datasette · Queries took 1.154ms · About: github-to-sqlite