{"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-905904540", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 905904540, "node_id": "IC_kwDOBm6k_c41_wGc", "user": {"value": 2670795, "label": "brandonrobertz"}, "created_at": "2021-08-25T21:59:14Z", "updated_at": "2021-08-25T21:59:55Z", "author_association": "CONTRIBUTOR", "body": "I did two tests: one with 1000 5-30mb DBs and a second with 20 multi gig DBs. For the second, I created them like so:\r\n`for i in {1..20}; do sqlite-generate db$i.db --tables ${i}00 --rows 100,2000 --columns 5,100 --pks 0 --fks 0; done`\r\n\r\nThis was for deciding whether to use lots of small DBs or to group things into a smaller number of bigger DBs. The second strategy wins.\r\n\r\nBy simply persisting the `_internal` DB to disk, I was able to avoid most of the performance issues I was experiencing previously. (To do this, I changed the `datasette/internal_db.py:init_internal_db` creates to if not exists, and changed the `_internal` DB instantiation in `datasette/app.py:Datasette.__init__` to a path with `is_mutable=True`.) Super rough, but the pages now load so I can continue testing ideas.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-905900807", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 905900807, "node_id": "IC_kwDOBm6k_c41_vMH", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-25T21:51:10Z", "updated_at": "2021-08-25T21:51:10Z", "author_association": "OWNER", "body": "10-20 minutes to populate `_internal`! How many databases and tables is that for?\r\n\r\nI may have to rethink the `_internal` mechanism entirely. One possible alternative would be for the Datasette homepage to just show a list of available databases (maybe only if there are more than X connected) and then load in their metadata only the first time they are accessed.\r\n\r\nI need to get my own stress testing rig setup for this.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/datasette/issues/859#issuecomment-905899177", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/859", "id": 905899177, "node_id": "IC_kwDOBm6k_c41_uyp", "user": {"value": 2670795, "label": "brandonrobertz"}, "created_at": "2021-08-25T21:48:00Z", "updated_at": "2021-08-25T21:48:00Z", "author_association": "CONTRIBUTOR", "body": "Upon first stab, there's two issues here:\r\n- DB/table/row counts (as discussed above). This isn't too bad if the DBs are actually above the MAX limit check.\r\n- Populating the internal DB. On first load of a giant set of DBs, it can take 10-20 mins to populate. By altering datasette and persisting the internal DB to disk, this problem is vastly improved, but I'm sure this will cause problems elsewhere.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 642572841, "label": "Database page loads too slowly with many large tables (due to table counts)"}, "performed_via_github_app": null}