{"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-905904540", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 905904540, "node_id": "IC_kwDOBm6k_c41_wGc", "user": {"value": 2670795, "label": "brandonrobertz"}, "created_at": "2021-08-25T21:59:14Z", "updated_at": "2021-08-25T21:59:55Z", "author_association": "CONTRIBUTOR", "body": "I did two tests: one with 1000 5-30mb DBs and a second with 20 multi gig DBs. For the second, I created them like so:\r\n`for i in {1..20}; do sqlite-generate db$i.db --tables ${i}00 --rows 100,2000 --columns 5,100 --pks 0 --fks 0; done`\r\n\r\nThis was for deciding whether to use lots of small DBs or to group things into a smaller number of bigger DBs. The second strategy wins.\r\n\r\nBy simply persisting the `_internal` DB to disk, I was able to avoid most of the performance issues I was experiencing previously. (To do this, I changed the `datasette/internal_db.py:init_internal_db` creates to if not exists, and changed the `_internal` DB instantiation in `datasette/app.py:Datasette.__init__` to a path with `is_mutable=True`.) Super rough, but the pages now load so I can continue testing ideas.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-905900807", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 905900807, "node_id": "IC_kwDOBm6k_c41_vMH", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-25T21:51:10Z", "updated_at": "2021-08-25T21:51:10Z", "author_association": "OWNER", "body": "10-20 minutes to populate `_internal`! How many databases and tables is that for?\r\n\r\nI may have to rethink the `_internal` mechanism entirely. One possible alternative would be for the Datasette homepage to just show a list of available databases (maybe only if there are more than X connected) and then load in their metadata only the first time they are accessed.\r\n\r\nI need to get my own stress testing rig setup for this.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-905899177", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 905899177, "node_id": "IC_kwDOBm6k_c41_uyp", "user": {"value": 2670795, "label": "brandonrobertz"}, "created_at": "2021-08-25T21:48:00Z", "updated_at": "2021-08-25T21:48:00Z", "author_association": "CONTRIBUTOR", "body": "Upon first stab, there's two issues here:\r\n- DB/table/row counts (as discussed above). This isn't too bad if the DBs are actually above the MAX limit check.\r\n- Populating the internal DB. On first load of a giant set of DBs, it can take 10-20 mins to populate. By altering datasette and persisting the internal DB to disk, this problem is vastly improved, but I'm sure this will cause problems elsewhere.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-904982056", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 904982056, "node_id": "IC_kwDOBm6k_c418O4o", "user": {"value": 2670795, "label": "brandonrobertz"}, "created_at": "2021-08-24T21:15:04Z", "updated_at": "2021-08-24T21:15:30Z", "author_association": "CONTRIBUTOR", "body": "I'm running into issues with this as well. All other pages seem to work with lots of DBs except the home page, which absolutely tanks. Would be willing to put some work into this, if there's been any kind of progress on concepts on how this ought to work.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-647922203", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 647922203, "node_id": "MDEyOklzc3VlQ29tbWVudDY0NzkyMjIwMw==", "user": {"value": 3243482, "label": "abdusco"}, "created_at": "2020-06-23T05:44:58Z", "updated_at": "2021-01-05T08:22:43Z", "author_association": "CONTRIBUTOR", "body": "I'm seeing the problem on database page. Index page and table page runs quite fast.\r\n\r\n- Tables have <10 columns (`id`, `url`, `title`, `body_html`, `date`, `author`, `meta` (for keeping unstructured json)). I've added index on `date` columns (using `sqlite-utils`) in addition to the index present on `id` columns. \r\n- All tables have FTS enabled on `text` and `varchar` columns (`title`, `body_html` etc) to speed up searching.\r\n- There are couple of tables related with foreign keys (think a thread in a forum and posts in that thread, related with `thread_id`)\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-652160909", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 652160909, "node_id": "MDEyOklzc3VlQ29tbWVudDY1MjE2MDkwOQ==", "user": {"value": 3243482, "label": "abdusco"}, "created_at": "2020-07-01T03:09:32Z", "updated_at": "2020-07-01T03:10:21Z", "author_association": "CONTRIBUTOR", "body": "I've just realized Datasette tries to count hidden tables too. There are 5 visible tables, 25 hidden tables, which I haven't realize earlier to consider their effect. I've turned off counting for hidden tables to see if it has any effect.\r\n\r\nWhat's the point of counting FTS tables?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-648669523", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 648669523, "node_id": "MDEyOklzc3VlQ29tbWVudDY0ODY2OTUyMw==", "user": {"value": 3243482, "label": "abdusco"}, "created_at": "2020-06-24T08:13:23Z", "updated_at": "2020-06-24T10:30:36Z", "author_association": "CONTRIBUTOR", "body": "I tried setting `cache_size_kb=0` then `cache_size_kb=100000`, still getting this behavior. I even changed `Database::table_counts` and lowered time limit to 1\r\n\r\n```py\r\ntable_count = (\r\n await self.execute(\r\n \"select count(*) from [{}]\".format(table),\r\n custom_time_limit=1,\r\n )\r\n).rows[0][0]\r\ncounts[table] = table_count\r\n```\r\n\r\nI feel like 10 seconds is a magic number, like a processing timeout and datasette gives up and returns the page. \r\nIndex page loads instantly, table page, query page, as well. But when I return to database page after some time, it loads in 10s.\r\n\r\nEDIT:\r\n\r\nIt's always like 10 + 0.3s, like 10s wait and timeout then 300ms to render the page", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-648234787", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 648234787, "node_id": "MDEyOklzc3VlQ29tbWVudDY0ODIzNDc4Nw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-06-23T15:22:51Z", "updated_at": "2020-06-23T15:22:51Z", "author_association": "OWNER", "body": "I wonder if this is a SQLite caching issue then?\n\nDatasette has a configuration option for this but I haven't spent much time experimenting with it so I don't know how much of an impact it can have: https://datasette.readthedocs.io/en/stable/config.html#cache-size-kb", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-648232645", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 648232645, "node_id": "MDEyOklzc3VlQ29tbWVudDY0ODIzMjY0NQ==", "user": {"value": 3243482, "label": "abdusco"}, "created_at": "2020-06-23T15:19:53Z", "updated_at": "2020-06-23T15:19:53Z", "author_association": "CONTRIBUTOR", "body": "The issue seems to appear sporadically, like when I return to database page after a while, during which some records have been added to the database.\r\n\r\nI've just visited database, page first visit took ~10s, consecutive visits took 0.3s.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-648163272", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 648163272, "node_id": "MDEyOklzc3VlQ29tbWVudDY0ODE2MzI3Mg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-06-23T13:52:23Z", "updated_at": "2020-06-23T13:52:23Z", "author_association": "OWNER", "body": "I'm chunking inserts at 100 at a time right now: https://github.com/simonw/sqlite-utils/blob/4d9a3204361d956440307a57bd18c829a15861db/sqlite_utils/db.py#L1030\r\n\r\nI think the performance is more down to using Faker to create the test data - generating millions of entirely fake, randomized records takes a fair bit of time.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-647925594", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 647925594, "node_id": "MDEyOklzc3VlQ29tbWVudDY0NzkyNTU5NA==", "user": {"value": 3243482, "label": "abdusco"}, "created_at": "2020-06-23T05:55:21Z", "updated_at": "2020-06-23T06:28:29Z", "author_association": "CONTRIBUTOR", "body": "Hmm, not seeing the problem now. \r\nI've removed the commented out sections in `database.py` and restarted the process. Database page now loads in <250ms.\r\n\r\nI have couple of workers that check some pages regularly and scrape new content and save to the DB. Could it be that datasette tries to recount tables every time database size changes? Normally it keeps a count cache, but as DB gets updated so often (new content every 5 min or so) it's practically recounting every time I go to the database page?\r\n\r\nEDIT: \r\nIt turns out it doesn't hold cache with mutable databases.\r\n\r\nI'll update the issue with more findings and a better way to reproduce the problem if I encounter it again.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-647936117", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 647936117, "node_id": "MDEyOklzc3VlQ29tbWVudDY0NzkzNjExNw==", "user": {"value": 3243482, "label": "abdusco"}, "created_at": "2020-06-23T06:25:17Z", "updated_at": "2020-06-23T06:25:17Z", "author_association": "CONTRIBUTOR", "body": "> \r\n> \r\n> ```\r\n> sqlite-generate many-cols.db --tables 2 --rows 200000 --columns 50\r\n> ```\r\n> \r\n> Looks like that will take 35 minutes to run (it's not a particularly fast tool).\r\n\r\nTry chunking write operations into batches every 1000 records or so.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-647935300", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 647935300, "node_id": "MDEyOklzc3VlQ29tbWVudDY0NzkzNTMwMA==", "user": {"value": 3243482, "label": "abdusco"}, "created_at": "2020-06-23T06:23:01Z", "updated_at": "2020-06-23T06:23:01Z", "author_association": "CONTRIBUTOR", "body": "> You said \"200k+, 50+ rows in a couple of tables\" - does that mean 50+ columns? I'll try with larger numbers of columns and see what difference that makes.\r\n\r\nAh that was a typo, I meant 50k.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-647923666", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 647923666, "node_id": "MDEyOklzc3VlQ29tbWVudDY0NzkyMzY2Ng==", "user": {"value": 3243482, "label": "abdusco"}, "created_at": "2020-06-23T05:49:31Z", "updated_at": "2020-06-23T05:49:31Z", "author_association": "CONTRIBUTOR", "body": "I think I should mention that having FTS on all tables mean I have 5 visible, 25 hidden (FTS) tables displayed on database page.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-647894903", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 647894903, "node_id": "MDEyOklzc3VlQ29tbWVudDY0Nzg5NDkwMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-06-23T04:07:59Z", "updated_at": "2020-06-23T04:07:59Z", "author_association": "OWNER", "body": "Just to check: are you seeing the problem on this page: https://latest.datasette.io/fixtures (the database page) - or this page (the table page): https://latest.datasette.io/fixtures/compound_three_primary_keys\r\n\r\nIf it's the table page then the problem may well be #862.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-647890619", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 647890619, "node_id": "MDEyOklzc3VlQ29tbWVudDY0Nzg5MDYxOQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-06-23T03:48:21Z", "updated_at": "2020-06-23T03:48:21Z", "author_association": "OWNER", "body": " sqlite-generate many-cols.db --tables 2 --rows 200000 --columns 50\r\n\r\nLooks like that will take 35 minutes to run (it's not a particularly fast tool).\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-647890378", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 647890378, "node_id": "MDEyOklzc3VlQ29tbWVudDY0Nzg5MDM3OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-06-23T03:47:19Z", "updated_at": "2020-06-23T03:47:19Z", "author_association": "OWNER", "body": "I generated a 600MB database using [sqlite-generate](https://github.com/simonw/sqlite-generate) just now - with 100 tables at 100,00 rows and 3 tables at 1,000,000 rows - and performance of the database page was fine, 250ms.\r\n\r\nThose tables only had 4 columns each though.\r\n\r\nYou said \"200k+, 50+ rows in a couple of tables\" - does that mean 50+ columns? I'll try with larger numbers of columns and see what difference that makes.\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-647194131", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 647194131, "node_id": "MDEyOklzc3VlQ29tbWVudDY0NzE5NDEzMQ==", "user": {"value": 3243482, "label": "abdusco"}, "created_at": "2020-06-21T23:15:54Z", "updated_at": "2020-06-21T23:26:09Z", "author_association": "CONTRIBUTOR", "body": "I'm not sure if table counts are to blame. There shouldn't be a ~3 orders of magnitude difference.\r\n\r\n```fish\r\nuser@klein /a/w/scrapyard (master)> set sql \"select count(*) from table_1; select count(*) from table_2; select count(*) from table_3;\"\r\nuser@klein /a/w/scrapyard (master)> time sqlite3 scrapyard.db \"$sql\"\r\n187489\r\n46492\r\n2229\r\n\r\n________________________________________________________\r\nExecuted in 25.57 millis fish external\r\n usr time 3.55 millis 0.00 micros 3.55 millis\r\n sys time 22.42 millis 1123.00 micros 21.30 millis\r\n```\r\n\r\nbut not letting datasette count the tables definitely helps.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-647189948", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 647189948, "node_id": "MDEyOklzc3VlQ29tbWVudDY0NzE4OTk0OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-06-21T22:30:12Z", "updated_at": "2020-06-21T22:30:43Z", "author_association": "OWNER", "body": "I'll write a little script which generates a 300MB SQLite file with a bunch of tables with lots of randomly generated rows in to help test this.\r\n\r\nHaving a tool like that which can generate larger databases with different gnarly performance characteristics will be useful for other performance work too.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-647189666", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 647189666, "node_id": "MDEyOklzc3VlQ29tbWVudDY0NzE4OTY2Ng==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-06-21T22:26:55Z", "updated_at": "2020-06-21T22:26:55Z", "author_association": "OWNER", "body": "This makes a lot of sense. I implemented the mechanism for the index page because I have my own instance of Datasette that was running slow, but it had a dozen database files attached to it. I've not run into this with a single giant database file but it absolutely makes sense that the same optimization would be necessary for the database page there too.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-647135713", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 647135713, "node_id": "MDEyOklzc3VlQ29tbWVudDY0NzEzNTcxMw==", "user": {"value": 3243482, "label": "abdusco"}, "created_at": "2020-06-21T14:30:02Z", "updated_at": "2020-06-21T14:30:02Z", "author_association": "CONTRIBUTOR", "body": "Oops, the same method is called from both index and database pages. But removing select count queries speed up the page load quite a bit.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null}