home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where author_association = "CONTRIBUTOR" and "updated_at" is on date 2020-06-23 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • abdusco 5

issue 1

  • Database page loads too slowly with many large tables (due to table counts) 5

author_association 1

  • CONTRIBUTOR · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
648232645 https://github.com/simonw/datasette/issues/859#issuecomment-648232645 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0ODIzMjY0NQ== abdusco 3243482 2020-06-23T15:19:53Z 2020-06-23T15:19:53Z CONTRIBUTOR

The issue seems to appear sporadically, like when I return to database page after a while, during which some records have been added to the database.

I've just visited database, page first visit took ~10s, consecutive visits took 0.3s.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841  
647925594 https://github.com/simonw/datasette/issues/859#issuecomment-647925594 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0NzkyNTU5NA== abdusco 3243482 2020-06-23T05:55:21Z 2020-06-23T06:28:29Z CONTRIBUTOR

Hmm, not seeing the problem now.
I've removed the commented out sections in database.py and restarted the process. Database page now loads in <250ms.

I have couple of workers that check some pages regularly and scrape new content and save to the DB. Could it be that datasette tries to recount tables every time database size changes? Normally it keeps a count cache, but as DB gets updated so often (new content every 5 min or so) it's practically recounting every time I go to the database page?

EDIT: It turns out it doesn't hold cache with mutable databases.

I'll update the issue with more findings and a better way to reproduce the problem if I encounter it again.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841  
647936117 https://github.com/simonw/datasette/issues/859#issuecomment-647936117 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0NzkzNjExNw== abdusco 3243482 2020-06-23T06:25:17Z 2020-06-23T06:25:17Z CONTRIBUTOR

sqlite-generate many-cols.db --tables 2 --rows 200000 --columns 50

Looks like that will take 35 minutes to run (it's not a particularly fast tool).

Try chunking write operations into batches every 1000 records or so.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841  
647935300 https://github.com/simonw/datasette/issues/859#issuecomment-647935300 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0NzkzNTMwMA== abdusco 3243482 2020-06-23T06:23:01Z 2020-06-23T06:23:01Z CONTRIBUTOR

You said "200k+, 50+ rows in a couple of tables" - does that mean 50+ columns? I'll try with larger numbers of columns and see what difference that makes.

Ah that was a typo, I meant 50k.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841  
647923666 https://github.com/simonw/datasette/issues/859#issuecomment-647923666 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0NzkyMzY2Ng== abdusco 3243482 2020-06-23T05:49:31Z 2020-06-23T05:49:31Z CONTRIBUTOR

I think I should mention that having FTS on all tables mean I have 5 visible, 25 hidden (FTS) tables displayed on database page.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 846.15ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows