{"id": 470345929, "node_id": "MDU6SXNzdWU0NzAzNDU5Mjk=", "number": 42, "title": "table.extract(...) method and \"sqlite-utils extract\" command", "user": {"value": 9599, "label": "simonw"}, "state": "closed", "locked": 0, "assignee": null, "milestone": {"value": 5897911, "label": "2.20"}, "comments": 21, "created_at": "2019-07-19T14:09:36Z", "updated_at": "2020-09-22T23:39:31Z", "closed_at": "2020-09-22T23:37:49Z", "author_association": "OWNER", "pull_request": null, "body": "One of my favourite features of [csvs-to-sqlite](https://github.com/simonw/csvs-to-sqlite) is that it can \"extract\" columns into a separate lookup table - for example:\r\n\r\n csvs-to-sqlite big_csv_file.csv -c country output.db\r\n\r\nThis will turn the `country` column in the resulting table into a integer foreign key against a new `country` table. You can see an example of what that looks like here: https://san-francisco.datasettes.com/registered-business-locations-3d50679/Business+Corridor was extracted from https://san-francisco.datasettes.com/registered-business-locations-3d50679/Registered_Business_Locations_-_San_Francisco?Business%20Corridor=1\r\n\r\nI'd like to have the same capability in `sqlite-utils` - but with the ability to run it against an existing SQLite table rather than just against a CSV.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/42/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 657572753, "node_id": "MDU6SXNzdWU2NTc1NzI3NTM=", "number": 894, "title": "?sort=colname~numeric to sort by by column cast to real", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 21, "created_at": "2020-07-15T18:47:48Z", "updated_at": "2021-08-20T02:07:53Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "If a text column actually contains numbers, being able to \"sort by column, treated as numeric\" would be really useful.\r\n\r\nProbably depends on column actions enabled by #690", "repo": {"value": 107914493, "label": "datasette"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/datasette/issues/894/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 642572841, "node_id": "MDU6SXNzdWU2NDI1NzI4NDE=", "number": 859, "title": "Database page loads too slowly with many large tables (due to table counts)", "user": {"value": 3243482, "label": "abdusco"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 21, "created_at": "2020-06-21T14:23:17Z", "updated_at": "2021-08-25T21:59:55Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "Hey,\r\nI have a database that I save in HTML from couple of web scrapers. There are around 200k+, 50+ rows in a couple of tables, with sqlite file weighing around 600MB.\r\n\r\nThe app runs on a VPS with 2 core CPU, 4GB RAM and refreshing database page regularly takes more than 10 seconds. I was suspecting that counting tables was the culprit, but manually running `select count(*) from table_name` for the largest table finishes under a second.\r\n\r\nI've looked at the source code. There's a check for index page for mutable databases larger than 100MB\r\nhttps://github.com/simonw/datasette/blob/799c5d53570d773203527f19530cf772dc2eeb24/datasette/views/index.py#L15\r\n\r\nbut this check is not performed for database page. \r\nI've manually crippled `Database::table_counts` method\r\n```py\r\nasync def table_counts(self, limit=10):\r\n if not self.is_mutable and self.cached_table_counts is not None:\r\n return self.cached_table_counts\r\n # Try to get counts for each table, $limit timeout for each count\r\n counts = {}\r\n for table in await self.table_names():\r\n try:\r\n # table_count = (\r\n # await self.execute(\r\n # \"select count(*) from [{}]\".format(table),\r\n # custom_time_limit=limit,\r\n # )\r\n # ).rows[0][0]\r\n counts[table] = 10 # table_count\r\n # In some cases I saw \"SQL Logic Error\" here in addition to\r\n # QueryInterrupted - so we catch that too:\r\n except (QueryInterrupted, sqlite3.OperationalError, sqlite3.DatabaseError):\r\n counts[table] = None\r\n if not self.is_mutable:\r\n self.cached_table_counts = counts\r\n return counts\r\n```\r\n\r\nnow the page loads in <100ms.\r\n\r\nIs it possible to apply size check on database page too?\r\n\r\n
\r\n\r\n/-/versions output\r\n\r\n
\r\n{\r\n    \"python\": {\r\n        \"version\": \"3.8.0\",\r\n        \"full\": \"3.8.0 (default, Oct 28 2019, 16:14:01) \\n[GCC 8.3.0]\"\r\n    },\r\n    \"datasette\": {\r\n        \"version\": \"0.44\"\r\n    },\r\n    \"asgi\": \"3.0\",\r\n    \"uvicorn\": \"0.11.5\",\r\n    \"sqlite\": {\r\n        \"version\": \"3.22.0\",\r\n        \"fts_versions\": [\r\n            \"FTS5\",\r\n            \"FTS4\",\r\n            \"FTS3\"\r\n        ],\r\n        \"extensions\": {\r\n            \"json1\": null\r\n        },\r\n        \"compile_options\": [\r\n            \"COMPILER=gcc-7.4.0\",\r\n            \"ENABLE_COLUMN_METADATA\",\r\n            \"ENABLE_DBSTAT_VTAB\",\r\n            \"ENABLE_FTS3\",\r\n            \"ENABLE_FTS3_PARENTHESIS\",\r\n            \"ENABLE_FTS3_TOKENIZER\",\r\n            \"ENABLE_FTS4\",\r\n            \"ENABLE_FTS5\",\r\n            \"ENABLE_JSON1\",\r\n            \"ENABLE_LOAD_EXTENSION\",\r\n            \"ENABLE_PREUPDATE_HOOK\",\r\n            \"ENABLE_RTREE\",\r\n            \"ENABLE_SESSION\",\r\n            \"ENABLE_STMTVTAB\",\r\n            \"ENABLE_UNLOCK_NOTIFY\",\r\n            \"ENABLE_UPDATE_DELETE_LIMIT\",\r\n            \"HAVE_ISNAN\",\r\n            \"LIKE_DOESNT_MATCH_BLOBS\",\r\n            \"MAX_SCHEMA_RETRY=25\",\r\n            \"MAX_VARIABLE_NUMBER=250000\",\r\n            \"OMIT_LOOKASIDE\",\r\n            \"SECURE_DELETE\",\r\n            \"SOUNDEX\",\r\n            \"TEMP_STORE=1\",\r\n            \"THREADSAFE=1\"\r\n        ]\r\n    }\r\n}\r\n
\r\n
", "repo": {"value": 107914493, "label": "datasette"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/datasette/issues/859/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 324835838, "node_id": "MDU6SXNzdWUzMjQ4MzU4Mzg=", "number": 276, "title": "Handle spatialite geometry columns better", "user": {"value": 45057, "label": "russss"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 21, "created_at": "2018-05-21T08:46:55Z", "updated_at": "2022-03-21T22:22:20Z", "closed_at": "2022-03-21T22:22:20Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "I'd like to see spatialite geometry columns rendered more sensibly - at the moment they come through as well-known-binary unless you use custom SQL, and WKB isn't of much use to anyone on the web.\r\n\r\nIn HTML: they should be shown either as simple lat/long (if it's just a point, for example), or as a sensible placeholder if they're more complex geometries.\r\n\r\nIn JSON: they should be GeoJSON geometries, (which means they can be automatically fed into a leaflet map with no further messing around).\r\n\r\nIn CSV: they should be WKT.\r\n\r\nI briefly wondered if this should go into a plugin, but I suspect it needs hooking in at a deeper level than the plugin architecture will support any time soon.", "repo": {"value": 107914493, "label": "datasette"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/datasette/issues/276/reactions\", \"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1243498298, "node_id": "I_kwDOBm6k_c5KHkc6", "number": 1746, "title": "Switch documentation theme to Furo", "user": {"value": 9599, "label": "simonw"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 21, "created_at": "2022-05-20T18:42:17Z", "updated_at": "2022-05-20T21:28:29Z", "closed_at": "2022-05-20T21:28:29Z", "author_association": "OWNER", "pull_request": null, "body": "https://github.com/pradyunsg/furo\r\n\r\nI just did this for `shot-scraper` and I really like it: https://shot-scraper.datasette.io/en/latest/\r\n\r\n- https://github.com/simonw/shot-scraper/issues/77", "repo": {"value": 107914493, "label": "datasette"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/datasette/issues/1746/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 777333388, "node_id": "MDU6SXNzdWU3NzczMzMzODg=", "number": 1168, "title": "Mechanism for storing metadata in _metadata tables", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 21, "created_at": "2021-01-01T18:47:27Z", "updated_at": "2023-09-28T18:29:05Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "_Original title: Perhaps metadata should all live in a `_metadata` in-memory database_\r\n\r\nInspired by #1150 - metadata should be exposed as an API, and for large Datasette instances that API may need to be paginated. So why not expose it through an in-memory database table?\r\n\r\nOne catch to this: plugins. #860 aims to add a plugin hook for metadata. But if the metadata comes from an in-memory table, how do the plugins interact with it?\r\n\r\nThe need to paginate over metadata does make a plugin hook that returns metadata for an individual table seem less wise, since we don't want to have to do 10,000 plugin hook invocations to show a list of all metadata.\r\n\r\nIf those plugins write directly to the in-memory table how can their contributions survive the server restarting?", "repo": {"value": 107914493, "label": "datasette"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/datasette/issues/1168/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null}