home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where "created_at" is on date 2019-10-27 and "updated_at" is on date 2019-10-27 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • zeluspudding 2

issue 1

  • Ways to improve fuzzy search speed on larger data sets? 2

author_association 1

  • NONE 2
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
546723302 https://github.com/simonw/datasette/issues/607#issuecomment-546723302 https://api.github.com/repos/simonw/datasette/issues/607 MDEyOklzc3VlQ29tbWVudDU0NjcyMzMwMg== zeluspudding 8431341 2019-10-27T18:59:55Z 2019-10-27T19:00:48Z NONE

Ultimately, I'm needing to serve searches like this to multiple users (at times concurrently). Given the size of the database I'm working with, can anyone comment as to whether I should be storing this in something like MySQL or Postgres rather than SQLite. I know there's been much defense of sqlite being performant but I wonder if those arguments break down as the database size increases.

For example, if I scroll to the bottom of that linked page, where it says Checklist For Choosing The Right Database Engine, here's how I answer those questions:

  • Is the data separated from the application by a network? → choose client/server Yes
  • Many concurrent writers? → choose client/server Not exactly. I may have many concurrent readers but almost no concurrent writers.
  • Big data? → choose client/server No, my database is less than 40 gb and wont approach a terabyte in the next decade.

So is sqlite still a good idea here?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ways to improve fuzzy search speed on larger data sets? 512996469  
546722281 https://github.com/simonw/datasette/issues/607#issuecomment-546722281 https://api.github.com/repos/simonw/datasette/issues/607 MDEyOklzc3VlQ29tbWVudDU0NjcyMjI4MQ== zeluspudding 8431341 2019-10-27T18:46:29Z 2019-10-27T19:00:40Z NONE

Update: I've created a table of only unique names. This reduces the search space from over 16 million, to just about 640,000. Interestingly, it takes less than 2 seconds to create this table using Python. Performing the same search that we did earlier for elon musk takes nearly a second - much faster than before but still not speedy enough for an autocomplete feature (which usually needs to return results within 100ms to feel "real time").

Any ideas for slashing the search speed nearly 10 fold?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ways to improve fuzzy search speed on larger data sets? 512996469  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 479.119ms · About: github-to-sqlite