{"html_url": "https://github.com/simonw/datasette/issues/1605#issuecomment-1332310772", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1605", "id": 1332310772, "node_id": "IC_kwDOBm6k_c5PaXL0", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-11-30T15:06:37Z", "updated_at": "2022-11-30T15:06:37Z", "author_association": "CONTRIBUTOR", "body": "I'll add issues for both and do a documentation PR.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1108671952, "label": "Scripted exports"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1605#issuecomment-1331694246", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1605", "id": 1331694246, "node_id": "IC_kwDOBm6k_c5PYAqm", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-11-30T06:18:41Z", "updated_at": "2022-11-30T06:18:41Z", "author_association": "OWNER", "body": "Those sounds to me like they should be promoted to documented, supported internals.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1108671952, "label": "Scripted exports"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1605#issuecomment-1331187551", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1605", "id": 1331187551, "node_id": "IC_kwDOBm6k_c5PWE9f", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-11-29T19:29:42Z", "updated_at": "2022-11-29T19:29:42Z", "author_association": "CONTRIBUTOR", "body": "Interesting. I started a version using metadata like I outlined up top, but I realized that there's no documented way for a plugin to access either metadata or canned queries. Or at least, I couldn't find a way.\r\n\r\nThere is this method: https://github.com/simonw/datasette/blob/main/datasette/app.py#L472 but I don't want to rely on it if it's not documented. Same with this: https://github.com/simonw/datasette/blob/main/datasette/app.py#L544\r\n\r\nIf those are safe, I'll build on them. I'm also happy to document them, if that greases the wheels.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1108671952, "label": "Scripted exports"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1605#issuecomment-1328169472", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1605", "id": 1328169472, "node_id": "IC_kwDOBm6k_c5PKkIA", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-11-27T04:32:14Z", "updated_at": "2022-11-27T04:32:14Z", "author_association": "OWNER", "body": "@eyeseast I started work on that plugin: https://github.com/simonw/datasette-export", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1108671952, "label": "Scripted exports"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1605#issuecomment-1072907200", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1605", "id": 1072907200, "node_id": "IC_kwDOBm6k_c4_80PA", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-19T00:52:54Z", "updated_at": "2022-03-19T00:53:45Z", "author_association": "OWNER", "body": "Had a thought about the implementation of this: it could make a really neat plugin.\r\n\r\nSomething like `datasette-export` which adds a `export` command using https://docs.datasette.io/en/stable/plugin_hooks.html#register-commands-cli - then you could run:\r\n\r\n datasette export my-export-dir mydatabase.db -m metadata.json --template-dir templates/\r\n\r\nAnd the command would then:\r\n\r\n- Create a `Datasette()` instance with those databases/metadata/etc\r\n- Execute`await datasette.client.get(\"/\")` to get the homepage HTML\r\n- Parse the HTML using BeautifulSoup to find all `a[href]`, `link[href]`, `script[src]`, `img[src]` elements that reference a relative path as opposed to one that starts with `http://`\r\n- Write out the homepage to `my-export-dir/index.html`\r\n- Recursively fetch and dump all of the other pages and assets that it found too\r\n\r\nAll of that HTML parsing may be over-complicating things. It could alternatively accept options for which pages you want to export:\r\n\r\n```\r\ndatasette export my-export-dir \\\r\n mydatabase.db -m metadata.json --template-dir templates/ \\\r\n --path / \\\r\n --path /mydatabase ...\r\n```\r\n\r\nOr a really wild option: it could allow you to define the paths you want to export using a SQL query:\r\n\r\n```\r\ndatasette export my-export-dir \\\r\n mydatabase.db -m metadata.json --template-dir templates/ \\\r\n --sql \"\r\nselect '/' as path, 'index.html' as filename\r\n union all\r\nselect '/mydatabase/articles/' || id as path, 'article-' || id || '.html' as filename\r\nfrom articles\r\n union all\r\nselect '/mydatabase/tags/' || tag as path, 'tag-' || tag || '.html' as filename\r\nfrom tags\r\n\"\r\n```\r\nWhich would save these files:\r\n- `index.html` as the content of `/`\r\n- `article-1.html` (and more) as the content of `/mydatabase/articles/1`\r\n- `tag-python.html` (and more) as the content of `/mydatabase/tags/python`", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1108671952, "label": "Scripted exports"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1605#issuecomment-1018778667", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1605", "id": 1018778667, "node_id": "IC_kwDOBm6k_c48uVQr", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-01-21T19:00:01Z", "updated_at": "2022-01-21T19:00:01Z", "author_association": "CONTRIBUTOR", "body": "Let me know if you want help prototyping any of this, because I'm thinking about it and trying stuff out. Happy to be a sounding board, if it helps.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1108671952, "label": "Scripted exports"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1605#issuecomment-1018766727", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1605", "id": 1018766727, "node_id": "IC_kwDOBm6k_c48uSWH", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-21T18:41:21Z", "updated_at": "2022-01-21T18:42:03Z", "author_association": "OWNER", "body": "Yeah I think this all hinges on:\r\n- #1101 \r\n\r\nAlso this comment about streaming full JSON arrays (not just newline-delimited) using [this trick](https://til.simonwillison.net/python/output-json-array-streaming):\r\n- https://github.com/simonw/datasette/issues/1356#issuecomment-1017016553\r\n\r\nI'm about ready to figure these out, as with so much it's still a little bit blocked on the refactor stuff from:\r\n- #1518\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1108671952, "label": "Scripted exports"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1605#issuecomment-1018741262", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1605", "id": 1018741262, "node_id": "IC_kwDOBm6k_c48uMIO", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-01-21T18:05:09Z", "updated_at": "2022-01-21T18:05:09Z", "author_association": "CONTRIBUTOR", "body": "Thinking about this more, as well as #1356 and various other tickets related to output formats, I think there's a missing plugin hook for formatting results, separate from `register_output_renderer` (or maybe part of it, depending on #1101). \r\n\r\nRight now, as I understand it, getting output in any format goes through the normal view stack -- a table, a row or a query -- and so by the time `register_output_renderer` gets it, the results have already been truncated or paginated. What I'd want, I think, is to be able to register ways to format results independent of where those results are sent.\r\n\r\nIt's possible this could be done using [`conn.row_factory`](https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.row_factory) (maybe in the `prepare_connection` hook), but I'm not sure that's where it belongs.\r\n\r\nAnother option is some kind of registry of serializers, which `register_output_renderer` and other plugin hooks could use. What I'm trying to avoid here is writing a plugin that also needs plugins for formats I haven't thought of yet.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1108671952, "label": "Scripted exports"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1605#issuecomment-1016994329", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1605", "id": 1016994329, "node_id": "IC_kwDOBm6k_c48nhoZ", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-01-20T00:27:17Z", "updated_at": "2022-01-20T00:27:17Z", "author_association": "CONTRIBUTOR", "body": "Right now, I usually have a line in a Makefile like this:\r\n\r\n```make\r\ncombined.geojson: project.db\r\n pipenv run datasette project.db --get /project/combined.geojson \\\r\n --load-extension spatialite \\\r\n --setting sql_time_limit_ms 5000 \\\r\n --setting max_returned_rows 20000 \\\r\n -m metadata.yml > $@\r\n```\r\n\r\nThat all assumes I've loaded whatever I need into `project.db` and created a canned query called `combined` (and then uses `datasette-geojson` for geojson output). \r\n\r\nIt works, but as you can see, it's a lot to manage, a lot of boilerplate, and it wasn't obvious how to get there. If there's an error in the canned query, I get an HTML error page, so that's hard to debug. And it's only one query, so each output needs a line like this. Make isn't ideal, either, for that reason.\r\n\r\nThe thing I really liked with `datafreeze` was doing templated filenames. I have a project now where I need to export a bunch of litttle geojson files, based on queries, and it would be awesome to be able to do something like this:\r\n\r\n```yml\r\ndatabases:\r\n project:\r\n queries:\r\n boundaries:\r\n sql: \"SELECT * FROM boundaries\"\r\n filename: \"boundaries/{id}.geojson\"\r\n mode: \"item\"\r\n format: geojson\r\n```\r\n\r\nAnd then do:\r\n\r\n```sh\r\ndatasette freeze -m metadata.yml project.db\r\n```\r\n\r\nFor HTML export, maybe there's a `template` argument, or `format: template` or something. And that gets you a static site generator, kinda for free.\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1108671952, "label": "Scripted exports"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1605#issuecomment-1016977725", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1605", "id": 1016977725, "node_id": "IC_kwDOBm6k_c48ndk9", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-19T23:55:08Z", "updated_at": "2022-01-19T23:55:08Z", "author_association": "OWNER", "body": "Oh that's interesting. I was thinking about this from a slightly different angle recently - pondering what a static site generator built on top of Datasette might look like.\r\n\r\nJust a sketch at the moment, but I was imagining a YAML configuration file with a SQL query that returns a list of paths - then a tool that runs that query and uses the equivalent of `datasette --get` to create a static copy of each of those paths.\r\n\r\nI think these two ideas can probably be merged. I'd love to know more about how you are solving this right now!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1108671952, "label": "Scripted exports"}, "performed_via_github_app": null}