{"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1683420879", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1683420879, "node_id": "IC_kwDOBm6k_c5kVvbP", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-08-18T06:33:24Z", "updated_at": "2023-08-18T15:15:34Z", "author_association": "OWNER", "body": "I completely agree: metadata is a mess, and it deserves our attention.\r\n\r\n> 1. Metadata cannot be updated without re-starting the entire Datasette instance.\r\n\r\nThat's not completely true - there are hacks around that. I have a plugin that applies one set of gnarly hacks for that here: https://github.com/simonw/datasette-remote-metadata - it's pretty grim though!\r\n\r\n> 2. The `metadata.json`/`metadata.yaml` has become a kitchen sink of unrelated (imo) features like plugin config, authentication config, canned queries\r\n\r\n100% this: it's a complete mess.\r\n\r\nDatasette used to have a `datasette --config foo:bar` mechanism, which I deprecated in favour of `datasette --setting foo bar` partly because I wanted to free up `--config` for pointing at a real config file, so we could stop dropping everything in `--metadata metadata.yml`.\r\n\r\n> 3. The Python APIs for defining extra metadata are a bit awkward (the `datasette.metadata()` class, `get_metadata()` hook, etc.)\r\n\r\nYes, they're not pretty at all.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1683429959", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1683429959, "node_id": "IC_kwDOBm6k_c5kVxpH", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-08-18T06:43:33Z", "updated_at": "2023-08-18T15:19:07Z", "author_association": "OWNER", "body": "The single biggest design challenge I've had with metadata relates to how it should or should not be inherited.\r\n\r\nIf you apply a license to a Datasette instance, it feels like that should flow down to cover all of the databases and all of the tables within those databases.\r\n\r\nIf the license is at the database level, it should cover all tables.\r\n\r\nBut... should source do the same thing? I made it behave the same way as license, but it's presumably common for one database to have a single license but multiple different sources of data.\r\n\r\nThen there's title - should that inherit? It feels like title should apply to only one level - you may want a title that applies to the instance, then a different custom title for databases and tables.\r\n\r\nHere's the current state of play for metadata: https://docs.datasette.io/en/1.0a3/metadata.html\r\n\r\nSo there's `title` and `description` - and I'll be honest, I'm not 100% sure even I understand how those should be inherited down by tables/etc.\r\n\r\nThere's `description_html` which over-rides the `description` if it is set. It's a useful customization hack, but a bit surprising.\r\n\r\nThen there are these six:\r\n\r\n- `license`\r\n- `license_url`\r\n- `source`\r\n- `source_url`\r\n- `about`\r\n- `about_url`\r\n\r\nI added `about` later than the others, because I realized that plenty of my own projects needed a link to an article explaining them somewhere - e.g. https://scotrail.datasette.io/\r\n\r\nTables can also have column descriptions - just a string for each column. There's a demo of those here: https://latest.datasette.io/fixtures/roadside_attractions\r\n\r\nAnd then there's all of the other stuff, most of which feels much more like \"settings\" than \"metadata\":\r\n\r\n- `sort: created` - the custom sort order\r\n- `size: 10` for a custom page size for a specific table\r\n- `sortable_columns` to set which columns can be used to sort\r\n- `hidden: true` to hide a table\r\n- `label_column: title` is an interesting one - it lets you hint to Datasette which column should be displayed when there is a foreign key relationship. It's sort-of-metadata and sort-of-a-setting.\r\n- `facets` sets default facets, see https://docs.datasette.io/en/1.0a3/facets.html#facets-in-metadata\r\n- `facet_size` sets the number of facets to display\r\n- `fts_table` and `fts_pk` can be used to configure FTS, especially for views: https://docs.datasette.io/en/1.0a3/full_text_search.html\r\n\r\nAnd the authentication stuff! `allow` and `allow_sql` blocks: https://docs.datasette.io/en/1.0a3/authentication.html#defining-permissions-with-allow-blocks\r\n\r\nAnd the new `permissions` key in the 1.0 alphas: https://docs.datasette.io/en/1.0a3/authentication.html#other-permissions-in-metadata\r\n\r\nI think that might be everything (excluding the `plugins` settings stuff, which is also a bad fit for metadata.)\r\n\r\nAnd to make things even more confusing... I believe you can add arbitrary key/value pairs to your metadata and then use them in your templates! I think I've heard from at least one person who uses that ability.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1683435579", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1683435579, "node_id": "IC_kwDOBm6k_c5kVzA7", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-08-18T06:49:39Z", "updated_at": "2023-08-18T06:49:39Z", "author_association": "OWNER", "body": "My ideal situation then would be something like this:\r\n\r\n- Metadata itself is VERY clearly described, including sensible rules for metadata inheritance where it makes sense. There is a `datasette.X` method for accessing it which is much more intuitive than `datasette.metadata()`.\r\n- It's possible that method should be an `async` method, because that would support things like plugins that lookup metadata in database tables better.\r\n- All templates etc switch to the new, clean, intuitive metadata mechanism before 1.0.\r\n- I'm interested in the option of metadata being able to live in a `_datasette_metadata` table in the databases themselves - either as a plugin or as a core feature. I think it makes a lot of sense for metadata to optionally live with the data that it describes.\r\n- Configuration gets split from metadata. The stuff that configures Datasette no longer lives in the `metadata.yml` file - it lives in `config.yml` (or even `datasette.yml`).\r\n\r\nCurrently we have three types of things:\r\n\r\n- Metadata - information about the data\r\n- Configuration - stuff like \"these columns should be sortable\" and \"this is configured as `fts_table`\" and suchlike\r\n- Settings - the stuff that you pass to `datasette --setting x y` on server start.\r\n\r\nShould settings and configuration be separate? I'm not 100% sure that they should - maybe those two concepts should be combined somehow.\r\n\r\nConfiguration directory mode needs to be considered too: https://docs.datasette.io/en/stable/settings.html#configuration-directory-mode - interestingly it already has a thing where it can pick up settings from a `settings.json` file - where settings are things like `datasette --setting sql_time_limit_ms 4000`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1683440597", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1683440597, "node_id": "IC_kwDOBm6k_c5kV0PV", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-08-18T06:54:49Z", "updated_at": "2023-08-18T06:54:49Z", "author_association": "OWNER", "body": "A related point that I've been considering a lot recently: it turns out that sometimes I really want to define settings on the CLI instead of in a file, purely for convenience.\r\n\r\nIt's pretty annoying when I want to try out a new plugin but I have to create a dedicated `metadata.yml` file for it just to setup a single option - I'd love to have the option to be able to run this instead:\r\n\r\n```bash\r\ndatasette data.db --plugin-setting datasette-upload-csvs default-database data\r\n```\r\n\r\nSo maybe there's a world in which all of the settings can be applied in a `datasette.yml` file OR with command-line options.\r\n\r\nThat gets trickier when you need to pass a nested structure or similar, but we could always support those as JSON:\r\n\r\n```bash\r\ndatasette data.db --plugin-setting datasette-emoji-reactions emoji '[\"\ud83d\ude3c\", \"\ud83d\udc3a\"]'\r\n```\r\nNote that we kind of have precedent for this in `datasette publish`: https://docs.datasette.io/en/stable/publish.html#custom-metadata-and-plugins\r\n\r\n```bash\r\ndatasette publish heroku my_database.db \\\r\n --name my-heroku-app-demo \\\r\n --install=datasette-auth-github \\\r\n --plugin-secret datasette-auth-github client_id your_client_id \\\r\n --plugin-secret datasette-auth-github client_secret your_client_secret\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1683443891", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1683443891, "node_id": "IC_kwDOBm6k_c5kV1Cz", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-08-18T06:58:15Z", "updated_at": "2023-08-18T06:58:15Z", "author_association": "OWNER", "body": "Hah, that `--plugin-secret` thing was a messy solution I came up with to the problem that all metadata is visible at `/-/metadata` - so if you need to stash a secret you need a way to keep it not-visible in there!\r\n\r\nHence the whole `$env` mess: https://docs.datasette.io/en/stable/plugins.html#secret-configuration-values\r\n\r\n```json\r\n{\r\n \"plugins\": {\r\n \"datasette-auth-github\": {\r\n \"client_secret\": {\r\n \"$env\": \"GITHUB_CLIENT_SECRET\"\r\n }\r\n }\r\n }\r\n}\r\n```\r\n\r\nIf configuration and metadata were separate we could ditch that whole messy situation - configuration can stay hidden, metadata can stay public.\r\n\r\nThough I have been thinking that Datasette might benefit from a \"secrets\" mechanism that's separate from configuration and metadata... kind of like what LLM has: https://llm.datasette.io/en/stable/help.html#llm-keys-help", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1684202932", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1684202932, "node_id": "IC_kwDOBm6k_c5kYuW0", "user": {"value": 15178711, "label": "asg017"}, "created_at": "2023-08-18T17:10:21Z", "updated_at": "2023-08-18T17:10:21Z", "author_association": "CONTRIBUTOR", "body": "I agree with all your points!\r\n\r\nI think the best solution would be having a `datasette.json` config file, where you \"configure\" your datasette instances, with settings, permissions/auth, plugin configuration, and table settings (sortable column, label columns, etc.). Which #2093 would do.\r\n\r\nThen optionally, you have a `metadata.json`, or use `datasette_metadata`, or some other plugin to define metadata (ex the future [sqlite-docs](https://github.com/asg017/sqlite-docs) plugin).\r\n\r\nEverything in `datasette.json` could also be overwritten by CLI flags, like `--setting key value`, `--plugin xxxx key value`.\r\n\r\nWe could even completely remove `settings.json` in favor or just `datasette.json`. Mostly because I think the less files the better, especially if they have generic names like `settings.json` or `config.json`. ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1684205563", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1684205563, "node_id": "IC_kwDOBm6k_c5kYu_7", "user": {"value": 15178711, "label": "asg017"}, "created_at": "2023-08-18T17:12:54Z", "updated_at": "2023-08-18T17:12:54Z", "author_association": "CONTRIBUTOR", "body": "Another option would be, instead of flat `datasette.json`/`datasette.yaml` files, we could instead use a Python file, like `datasette_config.py`. That way one could dynamically generate config (ex dev vs prod, auto-discover credentials, etc.). Kinda like Django settings.\r\n\r\nThough I imagine Python imports might make this complex to do, and json/yaml is already supported and pretty easy to write\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1684484426", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1684484426, "node_id": "IC_kwDOBm6k_c5kZzFK", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-08-18T22:12:52Z", "updated_at": "2023-08-18T22:12:52Z", "author_association": "OWNER", "body": "Yeah, I'm convinced by that. There's not point in having both `settings.json` and `datasette.json`.\r\n\r\nI like `datasette.json` ( / `datasette.yml`) as a name. That can be the file that lives in your config directory too, so if you run `datasette .` in a folder containing `datasette.yml` all of those settings get picked up.\r\n\r\nHere's a thought for how it could look - I'll go with the YAML format because I expect that to be the default most people use, just because it supports multi-line strings better.\r\n\r\nI based this on the big example at https://docs.datasette.io/en/1.0a3/metadata.html#using-yaml-for-metadata - and combined some bits from https://docs.datasette.io/en/1.0a3/authentication.html as well.\r\n\r\n```yaml\r\ntitle: Demonstrating Metadata from YAML\r\ndescription_html: |-\r\n

This description includes a long HTML string

\r\n \r\n\r\nsettings:\r\n default_page_size: 10\r\n max_returned_rows: 3000\r\n sql_time_limit_ms\": 8000\r\n\r\ndatabases:\r\n docs:\r\n permissions:\r\n create-table:\r\n id: editor\r\n fixtures:\r\n tables:\r\n no_primary_key:\r\n hidden: true\r\n queries:\r\n neighborhood_search:\r\n sql: |-\r\n select neighborhood, facet_cities.name, state\r\n from facetable join facet_cities on facetable.city_id = facet_cities.id\r\n where neighborhood like '%' || :text || '%' order by neighborhood;\r\n title: Search neighborhoods\r\n description_html: |-\r\n

This demonstrates basic LIKE search\r\n\r\npermissions:\r\n debug-menu:\r\n id: '*'\r\n\r\nplugins:\r\n datasette-ripgrep:\r\n path: /usr/local/lib/python3.11/site-packages\r\n```\r\nI'm inclined to say we try to be a super-set of the existing `metadata.yml` format, at least where it makes sense to do so. That way the upgrade path is smooth for people. Also, I don't think the format itself is terrible - it's the name that's the big problem.\r\n\r\nIn this example I've mixed in one extra concept: that `settings:` block with a bunch of settings in it.\r\n\r\nThere are some things in there that look a little bit like metadata - the `title` and `description_html` fields.\r\n\r\nBut _are they_ metadata? The title and description of the overall instance feels like it could be described as general configuration. The stuff for the `query` should live where the query itself is defined.\r\n\r\nNote that queries can be defined by a plugin hook too: https://docs.datasette.io/en/1.0a3/plugin_hooks.html#canned-queries-datasette-database-actor\r\n\r\nWhat do you think? Is this the right direction, or are you thinking there's a more radical redesign that would make sense here?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1684485591", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1684485591, "node_id": "IC_kwDOBm6k_c5kZzXX", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-08-18T22:14:35Z", "updated_at": "2023-08-18T22:14:35Z", "author_association": "OWNER", "body": "Actually there is one thing that I'm not comfortable about with respect to the existing design: the way the database / tables stuff is nested.\r\n\r\nThey assume that the user will attach the database to Datasette using a fixed name - `docs.db` or whatever.\r\n\r\nBut what if we want to support users downloading databases from each other and attaching them to Datasette where those DBs might carry some of their own configuration?\r\n\r\nMoving metadata into the databases makes sense there, but what about database-specific settings like the default sort order for a table, or configured canned queries?\r\n\r\nHaving those tied to the filename of the database itself feels unpleasant to me. But how else could we handle this?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1684488526", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1684488526, "node_id": "IC_kwDOBm6k_c5kZ0FO", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-08-18T22:18:39Z", "updated_at": "2023-08-18T22:18:39Z", "author_association": "OWNER", "body": "> Another option would be, instead of flat `datasette.json`/`datasette.yaml` files, we could instead use a Python file, like `datasette_config.py`. That way one could dynamically generate config (ex dev vs prod, auto-discover credentials, etc.). Kinda like Django settings.\r\n\r\n> Another option would be, instead of flat `datasette.json`/`datasette.yaml` files, we could instead use a Python file, like `datasette_config.py`. That way one could dynamically generate config (ex dev vs prod, auto-discover credentials, etc.). Kinda like Django settings.\r\n\r\nI'm not a fan of that. I feel like software history is full of examples of projects that implemented configuration-as-code and then later regretted it - the most recent example is `setup.py` in Python turning into `pyproject.yaml`, but I feel like I've seen that pattern play out elsewhere too.\r\n\r\nI don't think having people dynamically generate JSON/YAML for their configuration is a big burden. I'd have to see some very compelling use-cases to convince me otherwise.\r\n\r\nThat said, I do really like a bias towards settings that can be changed at runtime. Datasette has suffered a bit from some settings that can't be easily changed at runtime already - hence my gnarly https://github.com/simonw/datasette-remote-metadata plugin.\r\n\r\nFor things like Datasette Cloud for example the more people can configure without rebooting their container the better!\r\n\r\nI don't think live reconfiguration at runtime is incompatible with JSON/YAML configuration though. Caddy is one of my favourite examples of software that can be entirely re-configured at runtime by POSTING a big blob of JSON to it: https://caddyserver.com/docs/quick-starts/api\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1684496274", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1684496274, "node_id": "IC_kwDOBm6k_c5kZ1-S", "user": {"value": 15178711, "label": "asg017"}, "created_at": "2023-08-18T22:30:45Z", "updated_at": "2023-08-18T22:30:45Z", "author_association": "CONTRIBUTOR", "body": "> That said, I do really like a bias towards settings that can be changed at runtime\r\n\r\nDoes this include things like `--settings` values or plugin config? I can totally see being able to update metadata without restarting, but not sure if that would work well with `--setting`, plugin config, or auth/permissions stuff. \r\n\r\nWell it could work with `--setting` and auth/permissions, with a lot of core changes. But changing plugin config on the fly could be challenging, for plugin authors. ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1685259985", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1685259985, "node_id": "IC_kwDOBm6k_c5kcwbR", "user": {"value": 11784304, "label": "dvizard"}, "created_at": "2023-08-20T11:27:21Z", "updated_at": "2023-08-20T11:27:21Z", "author_association": "NONE", "body": "To chime in from a poweruser perspective: I'm worried that this is an overengineering trap. Yes, the current solution is somewhat messy. But there are datasette-wide settings, there are database-scope settings, there are table-scope settings etc, but then there are database-scope metadata and table-scope metadata. Trying to cleanly separate \"settings\" from \"configuration\" is, I believe, an uphill fight. Even separating db/table-scope settings from pure descriptive metadata is not always easy. Like, do canned queries belong to database metadata or to settings? Do I need two separate files for this?\r\n\r\nOne pragmatic solution I used in a project is stacking yaml configuration files. Basically, have an arbitrary number of yaml or json settings files that you load in a specified order. Every file adds to the corresponding settings in the earlier-loaded file (if it already existed). I implemented this myself but found later that there is an existing Python \"cascading dict\" type of thing, I forget what it's called. There is a bit of a challenge deciding whether there is \"replacement\" or \"addition\" (I think I pragmatically ran `update` on the second level of the dict but better solutions are certainly possible). \r\n\r\nThis way, one allows separation of settings into different blocks, while not imposing a specific idea of what belongs where that might not apply equally to all cases.\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1685260244", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1685260244, "node_id": "IC_kwDOBm6k_c5kcwfU", "user": {"value": 11784304, "label": "dvizard"}, "created_at": "2023-08-20T11:29:00Z", "updated_at": "2023-08-20T11:29:00Z", "author_association": "NONE", "body": "https://docs.python.org/3/library/collections.html#collections.ChainMap", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1685260624", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1685260624, "node_id": "IC_kwDOBm6k_c5kcwlQ", "user": {"value": 11784304, "label": "dvizard"}, "created_at": "2023-08-20T11:31:16Z", "updated_at": "2023-08-20T11:31:16Z", "author_association": "NONE", "body": "https://pypi.org/project/deep-chainmap/", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1685263948", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1685263948, "node_id": "IC_kwDOBm6k_c5kcxZM", "user": {"value": 11784304, "label": "dvizard"}, "created_at": "2023-08-20T11:50:10Z", "updated_at": "2023-08-20T11:50:10Z", "author_association": "NONE", "body": "This also makes it simple to separate out secrets.\r\n\r\n`datasette --config settings.yaml --config secrets.yaml --config db-docs.yaml --config db-fixtures.yaml`\r\n\r\nsettings.yaml\r\n```\r\nsettings:\r\n default_page_size: 10\r\n max_returned_rows: 3000\r\n sql_time_limit_ms\": 8000\r\nplugins:\r\n datasette-ripgrep:\r\n path: /usr/local/lib/python3.11/site-packages\r\n```\r\n\r\nsecrets.yaml\r\n```\r\nplugins:\r\n datasette-auth-github:\r\n client_secret: SUCH_SECRET \r\n```\r\n\r\n\r\ndb-docs.yaml\r\n```\r\ndatabases:\r\n docs:\r\n permissions:\r\n create-table:\r\n id: editor\r\n```\r\n\r\ndb-fixtures.yaml\r\n```\r\ndatabases:\r\n fixtures:\r\n tables:\r\n no_primary_key:\r\n hidden: true\r\n queries:\r\n neighborhood_search:\r\n sql: |-\r\n select neighborhood, facet_cities.name, state\r\n from facetable join facet_cities on facetable.city_id = facet_cities.id\r\n where neighborhood like '%' || :text || '%' order by neighborhood;\r\n title: Search neighborhoods\r\n description_html: |-\r\n

This demonstrates basic LIKE search\r\n```", "reactions": "{\"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1690787394", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1690787394, "node_id": "IC_kwDOBm6k_c5kx15C", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-08-23T23:52:02Z", "updated_at": "2023-08-23T23:52:02Z", "author_association": "OWNER", "body": "> This also makes it simple to separate out secrets.\r\n> \r\n> `datasette --config settings.yaml --config secrets.yaml --config db-docs.yaml --config db-fixtures.yaml`\r\n\r\nHaving multiple configs that combine in that way is a really interesting direction.\r\n\r\n> To chime in from a poweruser perspective: I'm worried that this is an overengineering trap. Yes, the current solution is somewhat messy. But there are datasette-wide settings, there are database-scope settings, there are table-scope settings etc, but then there are database-scope metadata and table-scope metadata. Trying to cleanly separate \"settings\" from \"configuration\" is, I believe, an uphill fight.\r\n\r\nI'm very keen on separating out the \"metadata\" - where metadata is the slimmest possible set of things, effectively the data license and the source and the column and table descriptions - from everything else, mainly because I want metadata to be able to travel with the data.\r\n\r\nOne idea that's been discussed before is having an optional mechanism for storing metadata in the SQLite database file itself - potentially in a `_datasette_metadata` table. That way you could distribute a DB file and anyone who opened it in Datasette would also see the correct metadata about it.\r\n\r\nThat's why I'm so keen on splitting out metadata from all of the other stuff - settings and plugin configuration and authentication rules.\r\n\r\nSo really it becomes \"true metadata\" v.s. \"all of the other junk that's accumulated in metadata and `settings.json`\".", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1690792514", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1690792514, "node_id": "IC_kwDOBm6k_c5kx3JC", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-08-24T00:00:16Z", "updated_at": "2023-08-24T00:02:55Z", "author_association": "OWNER", "body": "I've been thinking about what it might look like to allow command-line arguments to be used to define _any_ of the configuration options in `datasette.yml`, as alternative and more convenient syntax.\r\n\r\nHere's what I've come up with:\r\n```\r\ndatasette \\\r\n -s settings.sql_time_limit_ms 1000 \\\r\n -s plugins.datasette-auth-tokens.manage_tokens true \\\r\n -s plugins.datasette-auth-tokens.manage_tokens_database tokens \\\r\n -s plugins.datasette-ripgrep.path \"/home/simon/code-to-search\" \\\r\n -s databases.mydatabase.tables.example_table.sort created \\\r\n mydatabase.db tokens.db\r\n```\r\nWhich would be equivalent to `datasette.yml` containing this:\r\n```yaml\r\nplugins:\r\n datasette-auth-tokens:\r\n manage_tokens: true\r\n manage_tokens_database: tokens\r\n datasette-ripgrep:\r\n path: /home/simon/code-to-search\r\ndatabases:\r\n mydatabase:\r\n tables:\r\n example_table:\r\n sort: created\r\nsettings:\r\n sql_time_limit_ms: 1000\r\n```\r\nHere's a prototype implementation of this:\r\n```python\r\nimport json\r\nfrom typing import Any, List, Tuple\r\n\r\ndef _handle_pair(key: str, value: str) -> dict:\r\n \"\"\"\r\n Turn a key-value pair into a nested dictionary.\r\n foo, bar => {'foo': 'bar'}\r\n foo.bar, baz => {'foo': {'bar': 'baz'}}\r\n foo.bar, [1, 2, 3] => {'foo': {'bar': [1, 2, 3]}}\r\n foo.bar, \"baz\" => {'foo': {'bar': 'baz'}}\r\n foo.bar, '{\"baz\": \"qux\"}' => {'foo': {'bar': \"{'baz': 'qux'}\"}}\r\n \"\"\"\r\n try:\r\n value = json.loads(value)\r\n except json.JSONDecodeError:\r\n # If it doesn't parse as JSON, treat it as a string\r\n pass\r\n\r\n keys = key.split('.')\r\n result = current_dict = {}\r\n \r\n for k in keys[:-1]:\r\n current_dict[k] = {}\r\n current_dict = current_dict[k]\r\n \r\n current_dict[keys[-1]] = value\r\n return result\r\n\r\n\r\ndef _combine(base: dict, update: dict) -> dict:\r\n \"\"\"\r\n Recursively merge two dictionaries.\r\n \"\"\"\r\n for key, value in update.items():\r\n if isinstance(value, dict) and key in base and isinstance(base[key], dict):\r\n base[key] = _combine(base[key], value)\r\n else:\r\n base[key] = value\r\n return base\r\n\r\ndef handle_pairs(pairs: List[Tuple[str, Any]]) -> dict:\r\n \"\"\"\r\n Parse a list of key-value pairs into a nested dictionary.\r\n \"\"\"\r\n result = {}\r\n for key, value in pairs:\r\n parsed_pair = _handle_pair(key, value)\r\n result = _combine(result, parsed_pair)\r\n return result\r\n```\r\nExercised like this:\r\n```python\r\nprint(json.dumps(handle_pairs([\r\n (\"settings.sql_time_limit_ms\", \"1000\"),\r\n (\"plugins.datasette-auth-tokens.manage_tokens\", \"true\"),\r\n (\"plugins.datasette-auth-tokens.manage_tokens_database\", \"tokens\"),\r\n (\"plugins.datasette-ripgrep.path\", \"/home/simon/code-to-search\"),\r\n (\"databases.mydatabase.tables.example_table.sort\", \"created\"),\r\n]), indent=4))\r\n```\r\nOutput:\r\n```json\r\n{\r\n \"settings\": {\r\n \"sql_time_limit_ms\": 1000\r\n },\r\n \"plugins\": {\r\n \"datasette-auth-tokens\": {\r\n \"manage_tokens\": true,\r\n \"manage_tokens_database\": \"tokens\"\r\n },\r\n \"datasette-ripgrep\": {\r\n \"path\": \"/home/simon/code-to-search\"\r\n }\r\n },\r\n \"databases\": {\r\n \"mydatabase\": {\r\n \"tables\": {\r\n \"example_table\": {\r\n \"sort\": \"created\"\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\nNote that `-s` isn't currently an option for `datasette serve`.\r\n\r\n`--setting key value` IS an existing option, but it isn't completely compatible with this because it maps directly just to settings.\r\n\r\nAlthough... we could keep compatibility by saying that if you call `--setting known_setting value` and that `known_setting` is in this list then we treat it as if you said `-s settings.known_setting value` instead:\r\n\r\nhttps://github.com/simonw/datasette/blob/bdf59eb7db42559e538a637bacfe86d39e5d17ca/datasette/app.py#L114-L204", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1690799608", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1690799608, "node_id": "IC_kwDOBm6k_c5kx434", "user": {"value": 77071, "label": "pkulchenko"}, "created_at": "2023-08-24T00:09:47Z", "updated_at": "2023-08-24T00:10:41Z", "author_association": "NONE", "body": "@simonw, FWIW, I do exactly the same thing for one of my projects (both to allow multiple configuration files to be passed on the command line and setting individual values) and it works quite well for me and my users. I even use the same parameter name for both (https://studio.zerobrane.com/doc-configuration#configuration-via-command-line), but I understand why you may want to use different ones for files and individual values. There is one small difference that I accept code snippets, but I don't think it matters much in this case.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1690800119", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1690800119, "node_id": "IC_kwDOBm6k_c5kx4_3", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-08-24T00:10:32Z", "updated_at": "2023-08-24T00:39:00Z", "author_association": "OWNER", "body": "Something notable about this design is that, because the values in the key-value pairs are treated as JSON first and then strings only if they don't parse cleanly as JSON, it's possible to represent any structure (including nesting structures) using this syntax. You can do things like this if you need to (settings for an imaginary plugin):\r\n\r\n```bash\r\ndatasette data.db \\\r\n -s plugins.datasette-complex-plugin.configs '{\"foo\": [1,2,3], \"bar\": \"baz\"}'\r\n```\r\nWhich would be equivalent to:\r\n```yaml\r\nplugins:\r\n datasette-complex-plugin:\r\n configs:\r\n foo:\r\n - 1\r\n - 2\r\n - 3\r\n bar: baz\r\n```\r\nThis is a bit different from a previous attempt I made at the same problem: https://github.com/simonw/json-flatten - that used syntax like `foo.bar.[0]$int = 1` to specify an integer as the first item of an array, which is much more complex.\r\n\r\nThat previous design was meant to support round-trips, so you could take any nested JSON object and turn it into an HTMl form or query string where every value can have its own form field, then turn the result back again.\r\n\r\nFor the `datasette -s key value` feature we don't need round-tripping with individual values each editable on their own, so we can go with something much simpler.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1690800641", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1690800641, "node_id": "IC_kwDOBm6k_c5kx5IB", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-08-24T00:11:16Z", "updated_at": "2023-08-24T00:11:16Z", "author_association": "OWNER", "body": "> @simonw, FWIW, I do exactly the same thing for one of my projects (both to allow multiple configuration files to be passed on the command line and setting individual values) and it works quite well for me and my users. I even use the same parameter name for both (https://studio.zerobrane.com/doc-configuration#configuration-via-command-line), but I understand why you may want to use different ones for files and individual values. There is one small difference that I accept code snippets, but I don't think it matters much in this case.\r\n\r\nThat's a neat example thanks!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1691094870", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1691094870, "node_id": "IC_kwDOBm6k_c5kzA9W", "user": {"value": 1238873, "label": "rclement"}, "created_at": "2023-08-24T06:43:40Z", "updated_at": "2023-08-24T06:43:40Z", "author_association": "NONE", "body": "If I may, the \"path-like\" configuration is great but one thing that would be even greater: allowing the same configuration to be provided using environment variables.\r\n\r\nFor instance:\r\n\r\n```\r\ndatasette -s plugins.datasette-complex-plugin.configs '{\"foo\": [1,2,3], \"bar\": \"baz\"}'\r\n```\r\n\r\ncould also be provided using:\r\n\r\n```\r\nexport DS_PLUGINS_DATASETTE-COMPLEX-PLUGIN_CONFIGS='{\"foo\": [1,2,3], \"bar\": \"baz\"}'\r\ndatasette\r\n```\r\n\r\n(I do not like mixing `-` and `_` in env vars but I do not have a best easily reversible example at the moment)\r\n\r\nFYI, you could take some inspiration from another great open source data project, Metabase:\r\nhttps://www.metabase.com/docs/latest/configuring-metabase/config-file\r\nhttps://www.metabase.com/docs/latest/configuring-metabase/environment-variables", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1692180683", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1692180683, "node_id": "IC_kwDOBm6k_c5k3KDL", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-08-24T18:05:17Z", "updated_at": "2023-08-24T18:05:17Z", "author_association": "OWNER", "body": "That's a really good call, thanks @rclement - environment variable configuration totally makes sense here.\r\n\r\nNeed to figure out the right syntax for that. Something like this perhaps:\r\n\r\n```bash\r\nDATASETTE_CONFIG_PLUGINS='{\"datasette-ripgrep\": ...}'\r\n```\r\nHard to know how to make this nestable though. I considered this:\r\n```bash\r\nDATASETTE_CONFIG_PLUGINS_DATASETTE_RIPGREP_PATH='/path/to/code/'\r\n```\r\nBut that doesn't work, because how does the processing code know that it should split on `_` for most of the tokens but NOT split `DATASETTE_RIPGREP`, instead treating that as `datasette-ripgrep`?\r\n\r\nI checked and `-` is not a valid character in an environment variable, at least in zsh on macOS:\r\n```\r\n% export FOO_BAR-BAZ=1\r\nexport: not valid in this context: FOO_BAR-BAZ\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1692182910", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1692182910, "node_id": "IC_kwDOBm6k_c5k3Kl-", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-08-24T18:06:57Z", "updated_at": "2023-08-24T18:08:17Z", "author_association": "OWNER", "body": "The other thing that could work is something like this:\r\n```bash\r\nexport AUTH_TOKENS_DB=\"tokens\"\r\ndatasette \\\r\n -s settings.sql_time_limit_ms 1000 \\\r\n -s plugins.datasette-auth-tokens.manage_tokens true \\\r\n -e plugins.datasette-auth-tokens.manage_tokens_database AUTH_TOKENS_DB\r\n```\r\nSo `-e` is an alternative version of `-s` which reads from the named environment variable instead of having the value provided directly as the second value in the pair.\r\n\r\nI quite like this, because it could replace the really ugly `$ENV` pattern we have in plugin configuration at the moment: https://docs.datasette.io/en/1.0a4/plugins.html#secret-configuration-values\r\n```yaml\r\nplugins:\r\n datasette-auth-github:\r\n client_secret:\r\n $env: GITHUB_CLIENT_SECRET\r\n```", "reactions": "{\"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/2143#issuecomment-1692210044", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2143", "id": 1692210044, "node_id": "IC_kwDOBm6k_c5k3RN8", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-08-24T18:28:27Z", "updated_at": "2023-08-24T18:28:27Z", "author_association": "OWNER", "body": "Just spotted this: https://github.com/simonw/datasette/blob/17ec309e14f9c2e90035ba33f2f38ecc5afba2fa/datasette/app.py#L328-L332\r\n\r\nhttps://github.com/simonw/datasette/blob/17ec309e14f9c2e90035ba33f2f38ecc5afba2fa/datasette/app.py#L359-L360\r\n\r\nLooks to me like that second bit of code doesn't yet handle `datasette.yml`\r\n\r\nThis code does though:\r\n\r\nhttps://github.com/simonw/datasette/blob/17ec309e14f9c2e90035ba33f2f38ecc5afba2fa/datasette/app.py#L333-L335\r\n\r\n`parse_metadata()` is clearly a bad name for this function:\r\n\r\nhttps://github.com/simonw/datasette/blob/d97e82df3c8a3f2e97038d7080167be9bb74a68d/datasette/utils/__init__.py#L980-L990\r\n\r\nThat ` @documented` decorator indicates that it's part of the documented API used by plugin authors: https://docs.datasette.io/en/1.0a4/internals.html#parse-metadata-content\r\n\r\nSo we should rename it to something better like `parse_json_or_yaml()` but keep `parse_metadata` as an undocumented alias for that to avoid any unnecessary plugin breaks.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1855885427, "label": "De-tangling Metadata before Datasette 1.0"}, "performed_via_github_app": null}