issue_comments

7 rows where user = 8431341 sorted by updated_at descending

View and edit SQL

Suggested facets: issue_url, created_at (date), updated_at (date)

user

  • zeluspudding · 7

author_association

id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue
626991001 https://github.com/simonw/datasette/issues/699#issuecomment-626991001 https://api.github.com/repos/simonw/datasette/issues/699 MDEyOklzc3VlQ29tbWVudDYyNjk5MTAwMQ== zeluspudding 8431341 2020-05-11T22:06:34Z 2020-05-11T22:06:34Z NONE

Very nice! Thank you for sharing that :+1: :) Will try it out!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Authentication (and permissions) as a core concept 582526961
626807487 https://github.com/simonw/datasette/issues/699#issuecomment-626807487 https://api.github.com/repos/simonw/datasette/issues/699 MDEyOklzc3VlQ29tbWVudDYyNjgwNzQ4Nw== zeluspudding 8431341 2020-05-11T16:23:57Z 2020-05-11T16:24:59Z NONE

Authorization: bearer xxx auth for API keys is a plus plus for me. Looked into just adding this into your Flask logic but learned this project doesn't use flask. Interesting 🤔

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Authentication (and permissions) as a core concept 582526961
550649607 https://github.com/simonw/datasette/issues/607#issuecomment-550649607 https://api.github.com/repos/simonw/datasette/issues/607 MDEyOklzc3VlQ29tbWVudDU1MDY0OTYwNw== zeluspudding 8431341 2019-11-07T03:38:10Z 2019-11-07T03:38:10Z NONE

I just got FTS5 working and it is incredible! The lookup time for returning all rows where company name contains "Musk" from my table of 16,428,090 rows has dropped from 13,340.019 ms to 15.6ms. Well below the 100ms latency for the "real time autocomplete" feel (which doesn't currently include the http call).

So cool! Thanks again for the pointers and awesome datasette!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ways to improve fuzzy search speed on larger data sets? 512996469
548060038 https://github.com/simonw/datasette/issues/607#issuecomment-548060038 https://api.github.com/repos/simonw/datasette/issues/607 MDEyOklzc3VlQ29tbWVudDU0ODA2MDAzOA== zeluspudding 8431341 2019-10-30T18:47:57Z 2019-10-30T18:47:57Z NONE

Hi Simon, thanks for the pointer! Feeling good that I came to your conclusion a few days ago. I did hit a snag with figuring out how to compile a special version of sqlite for my windows machine (which I only realized I needed to do after running your command sqlite-utils enable-fts mydatabase.db items name description).

I'll try to solve that problem next week and report back here with my findings (if you know of a good tutorial for compiling on windows, I'm all ears). Either way, I'll try to close this issue out in the next two weeks. Thanks again!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ways to improve fuzzy search speed on larger data sets? 512996469
546752311 https://github.com/simonw/datasette/issues/607#issuecomment-546752311 https://api.github.com/repos/simonw/datasette/issues/607 MDEyOklzc3VlQ29tbWVudDU0Njc1MjMxMQ== zeluspudding 8431341 2019-10-28T00:37:10Z 2019-10-28T00:37:10Z NONE

UPDATE:
According to tips suggested in Squeezing Performance from SQLite: Indexes? Indexes! I have added an index to my large table and benchmarked query speeds in the case where I want to return all rows, rows exactly equal to 'Musk Elon' and, rows like 'musk'. Indexing reduced query time for each of those measures and dramatically reduced the time to return rows exactly equal to 'Musk Elon' as shown below:

table: edgar_idx
rows: 16,428,090 rows
indexed: False
Return all rows where company name exactly equal to Musk Elon
query: select rowid, * from edgar_idx where "company" = :p0 order by rowid limit 101
query time: Query took 21821.031ms

Return all rows where company name contains Musk
query: select rowid, * from edgar_idx where "company" like :p0 order by rowid limit 101
query time: Query took 20505.029ms

Return everything
query: select rowid, * from edgar_idx order by rowid limit 101
query time: Query took 7985.011ms

indexed: True
Return all rows where company name exactly equal to Musk Elon
query: select rowid, * from edgar_idx where "company" = :p0 order by rowid limit 101
query time: Query took 30.0ms

Return all rows where company name contains Musk
query: select rowid, * from edgar_idx where "company" like :p0 order by rowid limit 101
query time: Query took 13340.019ms

Return everything
query: select rowid, * from edgar_idx order by rowid limit 101
query time: Query took 2190.003ms

So indexing reduced query time for an exact match to "Musk Elon" from almost 22 seconds to 30.0ms. That's amazing and truly promising! However, an autocomplete feature relies on fuzzy / incomplete matching, which is more similar to the contains 'musk' query... Unfortunately, that takes 13 seconds even after indexing. So the hunt for a fast fuzzy / autocomplete search capability persists.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ways to improve fuzzy search speed on larger data sets? 512996469
546723302 https://github.com/simonw/datasette/issues/607#issuecomment-546723302 https://api.github.com/repos/simonw/datasette/issues/607 MDEyOklzc3VlQ29tbWVudDU0NjcyMzMwMg== zeluspudding 8431341 2019-10-27T18:59:55Z 2019-10-27T19:00:48Z NONE

Ultimately, I'm needing to serve searches like this to multiple users (at times concurrently). Given the size of the database I'm working with, can anyone comment as to whether I should be storing this in something like MySQL or Postgres rather than SQLite. I know there's been much defense of sqlite being performant but I wonder if those arguments break down as the database size increases.

For example, if I scroll to the bottom of that linked page, where it says Checklist For Choosing The Right Database Engine, here's how I answer those questions:

  • Is the data separated from the application by a network? → choose client/server
    Yes
  • Many concurrent writers? → choose client/server
    Not exactly. I may have many concurrent readers but almost no concurrent writers.
  • Big data? → choose client/server
    No, my database is less than 40 gb and wont approach a terabyte in the next decade.

So is sqlite still a good idea here?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ways to improve fuzzy search speed on larger data sets? 512996469
546722281 https://github.com/simonw/datasette/issues/607#issuecomment-546722281 https://api.github.com/repos/simonw/datasette/issues/607 MDEyOklzc3VlQ29tbWVudDU0NjcyMjI4MQ== zeluspudding 8431341 2019-10-27T18:46:29Z 2019-10-27T19:00:40Z NONE

Update: I've created a table of only unique names. This reduces the search space from over 16 million, to just about 640,000. Interestingly, it takes less than 2 seconds to create this table using Python. Performing the same search that we did earlier for elon musk takes nearly a second - much faster than before but still not speedy enough for an autocomplete feature (which usually needs to return results within 100ms to feel "real time").

Any ideas for slashing the search speed nearly 10 fold?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ways to improve fuzzy search speed on larger data sets? 512996469

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Query took 23.744ms · About: github-to-sqlite