issue_comments

16 rows where issue = 642572841 sorted by updated_at descending

View and edit SQL

Suggested facets: created_at (date), updated_at (date)

user

author_association

issue

  • Database page loads too slowly with many large tables (due to table counts) · 16
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue
648669523 https://github.com/simonw/datasette/issues/859#issuecomment-648669523 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0ODY2OTUyMw== abdusco 3243482 2020-06-24T08:13:23Z 2020-06-24T10:30:36Z CONTRIBUTOR

I tried setting cache_size_kb=0 then cache_size_kb=100000, still getting this behavior. I even changed Database::table_counts and lowered time limit to 1

table_count = (
    await self.execute(
        "select count(*) from [{}]".format(table),
        custom_time_limit=1,
    )
).rows[0][0]
counts[table] = table_count

I feel like 10 seconds is a magic number, like a processing timeout and datasette gives up and returns the page.
Index page loads instantly, table page, query page, as well. But when I return to database page after some time, it loads in 10s.

EDIT:

It's always like 10 + 0.3s, like 10s wait and timeout then 300ms to render the page

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841
648234787 https://github.com/simonw/datasette/issues/859#issuecomment-648234787 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0ODIzNDc4Nw== simonw 9599 2020-06-23T15:22:51Z 2020-06-23T15:22:51Z OWNER

I wonder if this is a SQLite caching issue then?

Datasette has a configuration option for this but I haven't spent much time experimenting with it so I don't know how much of an impact it can have: https://datasette.readthedocs.io/en/stable/config.html#cache-size-kb

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841
648232645 https://github.com/simonw/datasette/issues/859#issuecomment-648232645 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0ODIzMjY0NQ== abdusco 3243482 2020-06-23T15:19:53Z 2020-06-23T15:19:53Z CONTRIBUTOR

The issue seems to appear sporadically, like when I return to database page after a while, during which some records have been added to the database.

I've just visited database, page first visit took ~10s, consecutive visits took 0.3s.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841
648163272 https://github.com/simonw/datasette/issues/859#issuecomment-648163272 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0ODE2MzI3Mg== simonw 9599 2020-06-23T13:52:23Z 2020-06-23T13:52:23Z OWNER

I'm chunking inserts at 100 at a time right now: https://github.com/simonw/sqlite-utils/blob/4d9a3204361d956440307a57bd18c829a15861db/sqlite_utils/db.py#L1030

I think the performance is more down to using Faker to create the test data - generating millions of entirely fake, randomized records takes a fair bit of time.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841
647925594 https://github.com/simonw/datasette/issues/859#issuecomment-647925594 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0NzkyNTU5NA== abdusco 3243482 2020-06-23T05:55:21Z 2020-06-23T06:28:29Z CONTRIBUTOR

Hmm, not seeing the problem now.
I've removed the commented out sections in database.py and restarted the process. Database page now loads in <250ms.

I have couple of workers that check some pages regularly and scrape new content and save to the DB. Could it be that datasette tries to recount tables every time database size changes? Normally it keeps a count cache, but as DB gets updated so often (new content every 5 min or so) it's practically recounting every time I go to the database page?

EDIT:
It turns out it doesn't hold cache with mutable databases.

I'll update the issue with more findings and a better way to reproduce the problem if I encounter it again.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841
647936117 https://github.com/simonw/datasette/issues/859#issuecomment-647936117 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0NzkzNjExNw== abdusco 3243482 2020-06-23T06:25:17Z 2020-06-23T06:25:17Z CONTRIBUTOR

sqlite-generate many-cols.db --tables 2 --rows 200000 --columns 50

Looks like that will take 35 minutes to run (it's not a particularly fast tool).

Try chunking write operations into batches every 1000 records or so.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841
647935300 https://github.com/simonw/datasette/issues/859#issuecomment-647935300 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0NzkzNTMwMA== abdusco 3243482 2020-06-23T06:23:01Z 2020-06-23T06:23:01Z CONTRIBUTOR

You said "200k+, 50+ rows in a couple of tables" - does that mean 50+ columns? I'll try with larger numbers of columns and see what difference that makes.

Ah that was a typo, I meant 50k.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841
647923666 https://github.com/simonw/datasette/issues/859#issuecomment-647923666 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0NzkyMzY2Ng== abdusco 3243482 2020-06-23T05:49:31Z 2020-06-23T05:49:31Z CONTRIBUTOR

I think I should mention that having FTS on all tables mean I have 5 visible, 25 hidden (FTS) tables displayed on database page.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841
647922203 https://github.com/simonw/datasette/issues/859#issuecomment-647922203 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0NzkyMjIwMw== abdusco 3243482 2020-06-23T05:44:58Z 2020-06-23T05:44:58Z CONTRIBUTOR

I'm seeing the problem on database page. Index page and table page runs quite fast.

  • Tables have <10 columns (id, url, title, body_html, date, author, meta (for keeping unstructured json)). I've added index on date columns (using sqlite-utils) in addition to the index present on id columns.
  • All tables have FTS enabled on text and varchar columns (title, body_html vs) to speed up searching.
  • There are couple of tables related with foreign keys (think a thread in a forum and posts in that thread, related with thread_id)
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841
647894903 https://github.com/simonw/datasette/issues/859#issuecomment-647894903 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0Nzg5NDkwMw== simonw 9599 2020-06-23T04:07:59Z 2020-06-23T04:07:59Z OWNER

Just to check: are you seeing the problem on this page: https://latest.datasette.io/fixtures (the database page) - or this page (the table page): https://latest.datasette.io/fixtures/compound_three_primary_keys

If it's the table page then the problem may well be #862.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841
647890619 https://github.com/simonw/datasette/issues/859#issuecomment-647890619 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0Nzg5MDYxOQ== simonw 9599 2020-06-23T03:48:21Z 2020-06-23T03:48:21Z OWNER
sqlite-generate many-cols.db --tables 2 --rows 200000 --columns 50

Looks like that will take 35 minutes to run (it's not a particularly fast tool).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841
647890378 https://github.com/simonw/datasette/issues/859#issuecomment-647890378 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0Nzg5MDM3OA== simonw 9599 2020-06-23T03:47:19Z 2020-06-23T03:47:19Z OWNER

I generated a 600MB database using sqlite-generate just now - with 100 tables at 100,00 rows and 3 tables at 1,000,000 rows - and performance of the database page was fine, 250ms.

Those tables only had 4 columns each though.

You said "200k+, 50+ rows in a couple of tables" - does that mean 50+ columns? I'll try with larger numbers of columns and see what difference that makes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841
647194131 https://github.com/simonw/datasette/issues/859#issuecomment-647194131 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0NzE5NDEzMQ== abdusco 3243482 2020-06-21T23:15:54Z 2020-06-21T23:26:09Z CONTRIBUTOR

I'm not sure if table counts are to blame. There shouldn't be a ~3 orders of magnitude difference.

user@klein /a/w/scrapyard (master)> set sql "select count(*) from table_1; select count(*) from table_2; select count(*) from table_3;"
user@klein /a/w/scrapyard (master)> time sqlite3 scrapyard.db "$sql"
187489
46492
2229

________________________________________________________
Executed in   25.57 millis    fish           external
   usr time    3.55 millis    0.00 micros    3.55 millis
   sys time   22.42 millis  1123.00 micros   21.30 millis

but not letting datasette count the tables definitely helps.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841
647189948 https://github.com/simonw/datasette/issues/859#issuecomment-647189948 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0NzE4OTk0OA== simonw 9599 2020-06-21T22:30:12Z 2020-06-21T22:30:43Z OWNER

I'll write a little script which generates a 300MB SQLite file with a bunch of tables with lots of randomly generated rows in to help test this.

Having a tool like that which can generate larger databases with different gnarly performance characteristics will be useful for other performance work too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841
647189666 https://github.com/simonw/datasette/issues/859#issuecomment-647189666 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0NzE4OTY2Ng== simonw 9599 2020-06-21T22:26:55Z 2020-06-21T22:26:55Z OWNER

This makes a lot of sense. I implemented the mechanism for the index page because I have my own instance of Datasette that was running slow, but it had a dozen database files attached to it. I've not run into this with a single giant database file but it absolutely makes sense that the same optimization would be necessary for the database page there too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841
647135713 https://github.com/simonw/datasette/issues/859#issuecomment-647135713 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0NzEzNTcxMw== abdusco 3243482 2020-06-21T14:30:02Z 2020-06-21T14:30:02Z CONTRIBUTOR

Oops, the same method is called from both index and database pages. But removing select count queries speed up the page load quite a bit.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Query took 27.517ms · About: github-to-sqlite