issues: 751195017

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	pull_request	body	repo	type	active_lock_reason	performed_via_github_app	reactions	draft	state_reason
751195017	MDU6SXNzdWU3NTExOTUwMTc=	1111	Accessing a database's `.json` is slow for very large SQLite files	15178711	open	0			3	2020-11-26T00:27:27Z	2021-01-04T19:57:53Z		CONTRIBUTOR		I have a SQLite DB that's pretty large, 23GB and something like 300 million rows. I expect that most queries I run on it will be slow, which is fine, but there are some things that Datasette does that makes working with the DB very slow. Specifically, when I access the `.json` metadata for a table (which I believe it comes from `datasette/views/database.py`, it takes 43 seconds for the request to come in: bash $ time curl localhost:9999/out.json {"database": "out", "size": 24291454976, "tables": [{"name": "PageviewsHour", "columns": ["file", "code", "page", "pageviews"], "primary_keys": [], "count": null, "hidden": false, "fts_table": null, "foreign_keys": {"incoming": [], "outgoing": [{"other_table": "PageviewsHourFiles", "column": "file", "other_column": "file_id"}]}, "private": false}, {"name": "PageviewsHourFiles", "columns": ["file_id", "filename", "sha256", "size", "day", "hour"], "primary_keys": ["file_id"], "count": null, "hidden": false, "fts_table": null, "foreign_keys": {"incoming": [{"other_table": "PageviewsHour", "column": "file_id", "other_column": "file"}], "outgoing": []}, "private": false}, {"name": "sqlite_sequence", "columns": ["name", "seq"], "primary_keys": [], "count": 1, "hidden": false, "fts_table": null, "foreign_keys": {"incoming": [], "outgoing": []}, "private": false}], "hidden_count": 0, "views": [], "queries": [], "private": false, "allow_execute_sql": true, "query_ms": 43340.23213386536} real 0m43.417s user 0m0.006s sys 0m0.016s I suspect this is because a `COUNT()` is happening under the hood, which, when I run it through sqlite directly, does take around the same time: ```bash $ time sqlite3 out.db < <(echo "select count() from PageviewsHour;") 362794272 real 0m44.523s user 0m2.497s sys 0m6.703s ``` I'm using the `.json` request in the Observable Datasette Client to 1) verify that a link passed in is a reachable Datasette instance, and 2) a quick way to look at metadata for a db. A few different solutions I can think of: Have some other endpoint, like `/-/datasette.json` that the Observable Datasette client can fetch from to verify that the passed in URL is a valid Datasette (doesnt solve the slow problem, feel free to split this issue into 2) Have a way to turn off table counts when accessing a database's `.json` view, like `?no_count=1` or something Maybe have a timeout on the `table_counts()` function if it takes too long. which is odd, because it seems like it already does that (I think?), I can debug a little more if that's the case More than happy to debug further, or send a PR if you like one of the proposals above!	107914493	issue			{ "url": "https://api.github.com/repos/simonw/datasette/issues/1111/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }

Links from other tables

1 row from issues_id in issues_labels
3 rows from issue in issue_comments