id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,pull_request,body,repo,type,active_lock_reason,performed_via_github_app,reactions,draft,state_reason
1855885427,I_kwDOBm6k_c5unpBz,2143,De-tangling Metadata before Datasette 1.0,15178711,open,0,,,24,2023-08-18T00:51:50Z,2023-08-24T18:28:27Z,,CONTRIBUTOR,,"Metadata in Datasette is a really powerful feature, but is a bit difficult to work with. It was initially a way to add ""metadata"" about your ""data"" in Datasette instances, like descriptions for databases/tables/columns, titles, source URLs, licenses, etc. But it later became the go-to spot for other Datasette features that have nothing to do with metadata, like permissions/plugins/canned queries. 

Specifically, I've found the following problems when working with Datasette metadata:

1. Metadata cannot be updated without re-starting the entire Datasette instance.
2. The `metadata.json`/`metadata.yaml` has become a kitchen sink of unrelated (imo) features like plugin config, authentication config, canned queries
3. The Python APIs for defining extra metadata are a bit awkward (the `datasette.metadata()` class, `get_metadata()` hook, etc.)

## Possible solutions

Here's a few ideas of Datasette core changes we can make to address these problems. 

### Re-vamp the Datasette Python metadata APIs

The Datasette object has a single `datasette.metadata()` method that's a bit difficult to work with. There's also no Python API for inserted new metadata, so plugins have to rely on the `get_metadata()` hook.

The `get_metadata()` hook can also be improved - it doesn't work with async functions yet, so you're quite limited to what you can do.

(I'm a bit fuzzy on what to actually do here, but I imagine it'll be very small breaking changes to a few Python methods)

### Add an optional `datasette_metadata` table

Datasette should detect and use metadata stored in a new special table called `datasette_metadata`. This would be a regular table that a user can edit on their own, and would serve as a ""live updating"" source of metadata, than can be changed while the Datasette instance is running.

Not too sure what the schema would look like, but I'd imagine:

```sql
CREATE TABLE datasette_metadata(
  level text,
  target any,
  key text,
  value any,
  primary key (level, target)
)
```

Every row in this table would map to a single metadata ""entry"".

- `level` would be one of ""datasette"", ""database"", ""table"", ""column"", which is the ""level"" the entry describes. For example, `level=""table""` means it is metadata about a specific table, `level=""database""` for a specific database, or `level=""datasette""` for the entire Datasette instance.
- `target` would ""point"" to the specific object the entry metadata is about, and would depend on what `level` is specific. 
  - `level=""database""`: `target` would be the string name of the database that the metadata entry is about. ex `""fixtures""`
  - `level=""table""`: `target` would be a JSON array of two strings. The first element would be the database name, and the second would be the table name. ex `[""fixtures"", ""students""]`
  - `level=""column""`: `target` would be a JSON array of 3 strings: The database name, table name, and column name. Ex `[""fixtures"", ""students"", ""student_id""`]
- `key` would be the type of metadata entry the row has, similar to the current ""keys"" that exist in `metadata.json`. Ex `""about_url""`, `""source""`, `""description""`, etc
- `value` would be the text value of be metadata entry. The literal text value of a description, about_url, column_label, etc

A quick sample:

level | target | key | value
-- | -- | -- | --
datasette | NULL | title | my datasette title...
db | fixtures | source | <description of my database source>
table | [""fixtures"", ""students""] | label_column | student_name
column | [""fixtures"", ""students"", ""birthdate""] | description | <description of the fixtures.students.birthdate column>

This `datasette_metadata` would be configured with other tools, and hopefully not manually by end users. Datasette Core could also offer a UI for editing entries in `datasette_metadata`, to update descriptions/columns on the fly.

### Re-vamp `metadata.json` and move non-metadata config to another place

The motivation behind this is that it's awkward that `metadata.json` contains config about things that are not strictly metadata, including:

- Plugin configuration
- [Authentication/permissions](https://docs.datasette.io/en/latest/authentication.html#access-permissions-in-metadata) (ex the `allow` key on datasettes/databases/tables
- Canned queries. might be controversial, but in my mind, canned queries are application-specific code and configuration, and don't describe the data that exists in SQLite databases. 

I think we should move these outside of `metadata.json` and into a different file. The `datasette.json` idea in  #2093 may be a good solution here: plugin/permissions/canned queries can be defined in `datasette.json`, while `metadata.json`/`datasette_metadata` will strictly be about documenting databases/tables/columns. 
",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2143/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1864112887,PR_kwDOBm6k_c5Yo7bk,2151,Test Datasette on multiple SQLite versions,15178711,open,0,,,1,2023-08-23T22:42:51Z,2023-08-23T22:58:13Z,,CONTRIBUTOR,simonw/datasette/pulls/2151,"still testing, hope it works!

<!-- readthedocs-preview datasette start -->
----
:books: Documentation preview :books:: https://datasette--2151.org.readthedocs.build/en/2151/

<!-- readthedocs-preview datasette end -->",107914493,pull,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2151/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1,
1861812208,PR_kwDOBm6k_c5YhH-W,2149,"Start a new `datasette.yaml` configuration file, with settings support",15178711,closed,0,,,2,2023-08-22T16:24:16Z,2023-08-23T01:26:11Z,2023-08-23T01:26:11Z,CONTRIBUTOR,simonw/datasette/pulls/2149,"refs #2093 #2143 

This is the first step to implementing the new `datasette.yaml`/`datasette.json` configuration file. 

- The old `--config` argument is now back, and is the path to a `datasette.yaml` file. Acts like the `--metadata` flag.
- The old `settings.json` behavior has been removed.
- The `""settings""` key inside `datasette.yaml` defines the same `--settings` flags
- Values passed in `--settings` will over-write values in `datasette.yaml`

Docs for the Config file is pretty light, not much to add until we add more config to the file.

<!-- readthedocs-preview datasette start -->
----
:books: Documentation preview :books:: https://datasette--2149.org.readthedocs.build/en/2149/

<!-- readthedocs-preview datasette end -->",107914493,pull,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2149/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1821108702,I_kwDOCGYnMM5si-ne,579,Special handling for SQLite column of type `JSON`,15178711,open,0,,,0,2023-07-25T20:37:23Z,2023-07-25T20:37:23Z,,CONTRIBUTOR,,"`sqlite-utils` should detect and have specially handling for column with a `JSON` column. For example:

```sql
CREATE TABLE ""dogs"" (
  id INTEGER PRIMARY KEY,
  name TEXT,
  friends JSON 
);
```

## Automatic Nesting

According to [""Nested JSON Values""](https://sqlite-utils.datasette.io/en/stable/cli.html#nested-json-values), sqlite-utils will only expand JSON if the `--json-cols` flag is passed. It looks like it'll try to `json.load` all text column to test if its JSON, which can get expensive on non-json columns. 

Instead, `sqlite-utils` should be default (ie without the `--json-cols` flags) do the `maybe_json()` operation on columns with a declared `JSON` type. So the above table would expand the `""friends""` column as expected, withoutthe `--json-cols` flag:

```bash
sqlite-utils dogs.db ""select * from dogs"" | python -mjson.tool
```

```
[
    {
        ""id"": 1,
        ""name"": ""Cleo"",
        ""friends"": [
            {
                ""name"": ""Pancakes""
            },
            {
                ""name"": ""Bailey""
            }
        ]
    }
]
```

---

I'm sure there's other ways `sqlite-utils` can specially handle JSON columns, so keeping this open while I think of more",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/579/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1801394744,I_kwDOCGYnMM5rXxo4,567,Plugin system,15178711,closed,0,,,9,2023-07-12T17:02:14Z,2023-07-22T22:59:37Z,2023-07-22T22:59:36Z,CONTRIBUTOR,,"I'd like there to be a plugin system for sqlite-utils, similar to the datasette/llm plugins. I'd like to make plugins that would do things like:

- Register SQLite extensions for more SQL functions + virtual tables
- Register new subcommands
- Different input file formats for `sqlite-utils memory`
- Different output file formats (in addition to `--csv` `--tsv` `--nl` etc.

A few real-world use-cases of plugins I'd like to see in sqlite-utils:

- Register many of my sqlite extensions in sqlite-utils (`sqlite-http`, `sqlite-lines`, `sqlite-regex`, etc.)
- New subcommands to work with `sqlite-vss` vector tables
- Input/ouput Parquet/Avro/Arrow IPC files with `sqlite-arrow`",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/567/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
1816917522,PR_kwDOCGYnMM5WJ6Jm,573,feat: Implement a prepare_connection plugin hook,15178711,closed,0,,,4,2023-07-22T22:48:44Z,2023-07-22T22:59:09Z,2023-07-22T22:59:09Z,CONTRIBUTOR,simonw/sqlite-utils/pulls/573,"Just like the [Datasette prepare_connection hook](https://docs.datasette.io/en/stable/plugin_hooks.html#prepare-connection-conn-database-datasette), this PR adds a similar hook for the `sqlite-utils` plugin system. 

The sole argument is `conn`, since I don't believe a `database` or `datasette` argument would be relevant here. 

I want to do this so I can release `sqlite-utils` plugins for my [SQLite extensions](https://github.com/asg017/sqlite-ecosystem), similar to the Datasette plugins I've release for them. 

An example plugin: https://gist.github.com/asg017/d7cdf0d56e2be87efda28cebee27fa3c

```bash
$ sqlite-utils install https://gist.github.com/asg017/d7cdf0d56e2be87efda28cebee27fa3c/archive/5f5ad549a40860787629c69ca120a08c32519e99.zip

$ sqlite-utils memory 'select hello(""alex"") as response'
[{""response"": ""Hello, alex!""}]
```
Refs:
- #574 

<!-- readthedocs-preview sqlite-utils start -->
----
:books: Documentation preview :books:: https://sqlite-utils--573.org.readthedocs.build/en/573/

<!-- readthedocs-preview sqlite-utils end -->",140912432,pull,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/573/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1783304750,I_kwDOBm6k_c5qSxIu,2094,JS Plugin Hooks for the Code Editor,15178711,open,0,,,0,2023-07-01T00:51:57Z,2023-07-01T00:51:57Z,,CONTRIBUTOR,,"When #2052 merges, I'd like to add support to add extensions/functions to the Datasette code editor. 

I'd eventually like to build a JS plugin for [`sqlite-docs`](https://github.com/asg017/sqlite-docs), to add things like:

- Inline documentation for tables/columns on hover
- Inline docs for custom functions that are loaded in
- More detailed autocomplete for tables/columns/functions

I did some hacking to see what this would look like, see here:

<img width=""1223"" alt=""image"" src=""https://github.com/simonw/datasette/assets/15178711/64f95cbc-1492-4365-896f-b88c6d08a649"">
<img width=""1223"" alt=""image"" src=""https://github.com/simonw/datasette/assets/15178711/73e602ba-5f45-417a-997e-5aea1738527a"">

There can be a new hook that allows JS plugins to add new ""extension"" in the CodeMirror editorview here:

https://github.com/simonw/datasette/blob/8cd60fd1d899952f1153460469b3175465f33f80/datasette/static/cm-editor-6.0.1.js#L25

Will need some more planning. For example, the Codemirror bundle in Datasette has functions that we could re-export for plugins to use (so we don't load 2 version of `""@codemirror/autocomplete""`, for example. ",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2094/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1620515757,I_kwDOBm6k_c5glxut,2039,Subtle bug with `--load-extension` and `--static` flags with absolute Windows paths with`C:\`,15178711,open,0,,,0,2023-03-12T21:18:52Z,2023-03-12T21:18:52Z,,CONTRIBUTOR,,"From the Datasette discord: A user tried running the following command on windows:

```
datasette  --load-extension=""C:\spatialite\mod_spatialite-5.0.1-win-x86\mod_spatialite.dll""
```
This failed with `""The specified module could not be found""`, because the entrypoint option introduced in #1789 splits the input differently. Instead of loading the extension found at `""C:\spatialite\mod_spatialite-5.0.1-win-x86\mod_spatialite.dll""`, it instead tried to load the extension at `""C""` with entrypoint `""\spatialite\mod_spatialite-5.0.1-win-x86\mod_spatialite.dll"". 

This is hard because most absolute windows paths have a colon in them, like `C:\foo.txt` or `D:\bar.txt`.  I'd image the `--static` flag is also vulnerable to this type of bug.


The ""solution"" is to use a relative path instead, but that doesn't feel that great. ",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2039/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1344823170,PR_kwDOBm6k_c49e3_k,1789,Add new entrypoint option to `--load-extension`,15178711,closed,0,,,9,2022-08-19T19:27:47Z,2022-08-23T18:42:52Z,2022-08-23T18:34:30Z,CONTRIBUTOR,simonw/datasette/pulls/1789,"Closes #1784 

The `--load-extension` flag can now accept an optional ""entrypoint"" value, to specify which entrypoint SQLite should load from the given extension. 

```bash
# would load default entrypoint like before
datasette data.db --load-extension ext

# loads the extensions with the ""sqlite3_foo_init"" entrpoint
datasette data.db --load-extension ext:sqlite3_foo_init

# loads the extensions with the ""sqlite3_bar_init"" entrpoint
datasette data.db --load-extension ext:sqlite3_bar_init
```

For testing, I added a small SQLite extension in C at `tests/ext.c`. If compiled, then pytest will run the unit tests in `test_load_extensions.py`to verify that Datasette loads in extensions correctly (and loads the correct entrypoints). Compiling the extension requires a C compiler, I compiled it on my Mac with:

```
gcc ext.c -I path/to/sqlite -fPIC -shared -o ext.dylib
```

Where `path/to/sqlite` is a directory that contains the SQLite amalgamation header files.

Re documentation: I added a bit to the help text for `--load-extension` (which I believe should auto-add to documentation?), and the existing extension documentation is spatialite specific. Let me know if a new extensions documentation page would be helpful!",107914493,pull,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/1789/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1339663518,I_kwDOBm6k_c5P2aSe,1784,"Include ""entrypoint"" option on `--load-extension`?",15178711,closed,0,,,2,2022-08-16T00:22:57Z,2022-08-23T18:34:31Z,2022-08-23T18:34:31Z,CONTRIBUTOR,,"## Problem 

SQLite extensions have the option to define multiple ""entrypoints"" in each loadable extension. For example, the upcoming version of `sqlite-lines` will have 2 entrypoints: the default `sqlite3_lines_init` (which SQLite will automatically guess for) and `sqlite3_lines_noread_init`. The `sqlite3_lines_noread_init` version omits functions that read from the filesystem, which is necessary for security purposes when running untrusted SQL (which Datasette does).

(Similar multiple entrypoints will also be added for sqlite-http).



The `--load-extension` flag, however, doesn't give the option to specify a different entrypoint, so the default one is always used. 

## Proposal

I want there to be a new command line option of the `--load-extension` flag to specify a custom entrypoint like so:
```
datasette my.db \
  --load-extension ./lines0 sqlite3_lines0_noread_init
```

Then, under the hood, this line of code:

https://github.com/simonw/datasette/blob/7af67b54b7d9bca43e948510fc62f6db2b748fa8/datasette/app.py#L562

Would look something like this:

```python
 conn.execute(""SELECT load_extension(?, ?)"", [extension, entrypoint]) 
```

One potential problem: For backward compatibility, I'm not sure if Click allows cli flags to have variable number of options (""arity""). So I guess it could also use a `:` delimiter like `--static`:

```
datasette my.db \
  --load-extension ./lines0:sqlite3_lines0_noread_init
```

Or maybe even a new flag name?

```
datasette my.db \
  --load-extension-entrypoint ./lines0 sqlite3_lines0_noread_init
```


Personally I prefer the `:` option... and maybe even `--load-extension` -> `--load`? Definitely out of scope for this issue tho

```
datasette my.db \
  --load./lines0:sqlite3_lines0_noread_init
```",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/1784/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
1060631257,I_kwDOBm6k_c4_N_LZ,1528,"Add new `""sql_file""` key to Canned Queries in metadata?",15178711,open,0,,,3,2021-11-22T21:58:01Z,2022-06-10T03:23:08Z,,CONTRIBUTOR,,"Currently for canned queries, you have to inline SQL in your `metadata.yaml` like so:

```yaml
databases:
  fixtures:
    queries:
      neighborhood_search:
        sql: |-
          select neighborhood, facet_cities.name, state
          from facetable
            join facet_cities on facetable.city_id = facet_cities.id
          where neighborhood like '%' || :text || '%'
          order by neighborhood
        title: Search neighborhoods
```

This works fine, but for a few reasons, I usually have my canned queries already written in separate `.sql` files. I'd like to instead re-use those instead of re-writing it. 

So, I'd like to see a new `""sql_file""` key that works like so:

`metadata.yaml`:

```yaml
databases:
  fixtures:
    queries:
      neighborhood_search:
        sql_file: neighborhood_search.sql
        title: Search neighborhoods
```
`neighborhood_search.sql`:
```sql
select neighborhood, facet_cities.name, state
from facetable
join facet_cities on facetable.city_id = facet_cities.id
where neighborhood like '%' || :text || '%'
order by neighborhood
```

Both of these would work in the exact same way, where Datasette would instead open + include `neighborhood_search.sql` on startup. 


A few reasons why I'd like to keep my canned queries SQL separate from metadata.yaml:

- Keeping SQL in standalone SQL files means syntax highlighting and other text editor integrations in my code
- Multiline strings in yaml, while functional, are a tad cumbersome and are hard to edit
- Works well with other tools (can pipe `.sql` files into the `sqlite3` CLI, or use with other SQLite clients easier)
- Typically my canned queries are quite long compared to everything else in my metadata.yaml, so I'd love to separate it where possible

Let me know if this is a feature you'd like to see, I can try to send up a PR if this sounds right!",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/1528/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
751195017,MDU6SXNzdWU3NTExOTUwMTc=,1111,Accessing a database's `.json` is slow for very large SQLite files,15178711,open,0,,,3,2020-11-26T00:27:27Z,2021-01-04T19:57:53Z,,CONTRIBUTOR,,"I have a SQLite DB that's pretty large, 23GB and something like 300 million rows. I expect that most queries I run on it will be slow, which is fine, but there are some things that Datasette does that makes working with the DB very slow. Specifically, when I access the `.json` metadata for a table (which I believe it comes from `datasette/views/database.py`, it takes 43 seconds for the request to come in:

```bash
$ time curl localhost:9999/out.json
{""database"": ""out"", ""size"": 24291454976, ""tables"": [{""name"": ""PageviewsHour"", ""columns"": [""file"", ""code"", ""page"", ""pageviews""], ""primary_keys"": [], ""count"": null, ""hidden"": false, ""fts_table"": null, ""foreign_keys"": {""incoming"": [], ""outgoing"": [{""other_table"": ""PageviewsHourFiles"", ""column"": ""file"", ""other_column"": ""file_id""}]}, ""private"": false}, {""name"": ""PageviewsHourFiles"", ""columns"": [""file_id"", ""filename"", ""sha256"", ""size"", ""day"", ""hour""], ""primary_keys"": [""file_id""], ""count"": null, ""hidden"": false, ""fts_table"": null, ""foreign_keys"": {""incoming"": [{""other_table"": ""PageviewsHour"", ""column"": ""file_id"", ""other_column"": ""file""}], ""outgoing"": []}, ""private"": false}, {""name"": ""sqlite_sequence"", ""columns"": [""name"", ""seq""], ""primary_keys"": [], ""count"": 1, ""hidden"": false, ""fts_table"": null, ""foreign_keys"": {""incoming"": [], ""outgoing"": []}, ""private"": false}], ""hidden_count"": 0, ""views"": [], ""queries"": [], ""private"": false, ""allow_execute_sql"": true, ""query_ms"": 43340.23213386536}
real	0m43.417s
user	0m0.006s
sys	0m0.016s
```

I suspect this is because a `COUNT(*)` is happening under the hood, which, when I run it through sqlite directly, does take around the same time:

```bash
$ time sqlite3 out.db < <(echo ""select count(*) from PageviewsHour;"")
362794272

real	0m44.523s
user	0m2.497s
sys	0m6.703s
```

I'm using the `.json` request in the [Observable Datasette Client](https://observablehq.com/@asg017/datasette-client) to 1) verify that a link passed in is a reachable Datasette instance, and 2) a quick way to look at metadata for a db. A few different solutions I can think of:

1. Have some other endpoint, like `/-/datasette.json` that the Observable Datasette client can fetch from to verify that the passed in URL is a valid Datasette (doesnt solve the slow problem, feel free to split this issue into 2)
2. Have a way to turn off table counts when accessing a database's `.json` view, like `?no_count=1` or something
3. Maybe have a timeout on the `table_counts()` function if it takes too long. which is odd, because it seems like it already does that (I think?), I can debug a little more if that's the case

More than happy to debug further, or send a PR if you like one of the proposals above!",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/1111/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,