id,node_id,number,title,user,user_label,state,locked,assignee,assignee_label,milestone,milestone_label,comments,created_at,updated_at,closed_at,author_association,pull_request,body,repo,repo_label,type,active_lock_reason,performed_via_github_app,reactions,draft,state_reason
1901483874,PR_kwDOBm6k_c5amULw,2190,"Raise an exception if a ""plugins"" block exists in metadata.json",15178711,asg017,closed,0,,,,,5,2023-09-18T18:08:56Z,2023-10-12T16:20:51Z,2023-10-12T16:20:51Z,CONTRIBUTOR,simonw/datasette/pulls/2190,"refs #2183 #2093

From [this comment](https://github.com/simonw/datasette/pull/2183#issuecomment-1714699724) in #2183: If a `""plugins""` block appears in `metadata.json`, it means that a user hasn't migrated over their plugin configuration from `metadata.json` to `datasette.yaml`, which is a breaking change in Datasette 1.0. 

This PR will ensure that an error is raised whenever that happens.

<!-- readthedocs-preview datasette start -->
----
:books: Documentation preview :books:: https://datasette--2190.org.readthedocs.build/en/2190/

<!-- readthedocs-preview datasette end -->",107914493,datasette,pull,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2190/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1901768721,PR_kwDOBm6k_c5anSg5,2191,"Move `permissions`,  `allow` blocks, canned queries and more out of `metadata.yaml` and into `datasette.yaml`",15178711,asg017,closed,0,,,,,4,2023-09-18T21:21:16Z,2023-10-12T16:16:38Z,2023-10-12T16:16:38Z,CONTRIBUTOR,simonw/datasette/pulls/2191,"The PR moves the following fields from `metadata.yaml` to `datasette.yaml`:

```
permissions
allow
allow_sql
queries
extra_css_urls
extra_js_urls
```

This is a significant breaking change that users will need to upgrade their `metadata.yaml` files for. But the format/locations are similar to the previous version, so it shouldn't be too difficult to upgrade.

One note: I'm still working on the Configuration docs, specifically the ""reference"" section. Though it's pretty small, the rest of read to review",107914493,datasette,pull,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2191/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1900026059,I_kwDOBm6k_c5xQBjL,2188,"Plugin Hooks for ""compile to SQL"" languages",15178711,asg017,open,0,,,,,2,2023-09-18T01:37:15Z,2023-09-18T06:58:53Z,,CONTRIBUTOR,,"There's a ton of tools/languages that compile to SQL, which may be nice in Datasette. Some examples:

- Logica https://logica.dev
- PRQL https://prql-lang.org
- Malloy, but not sure if it works with SQLite? https://github.com/malloydata/malloy

It would be cool if plugins could extend Datasette to use these languages, in both the code editor and API usage.

A few things I'd imagine a `datasette-prql` or `datasette-logica` plugin would do:

- `prql=` instead of `sql=`
- Code editor support (syntax highlighting, autocomplete)
- Hide/show SQL",107914493,datasette,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2188/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1891212159,PR_kwDOBm6k_c5aD33C,2183,`datasette.yaml` plugin support,15178711,asg017,closed,0,,,,,4,2023-09-11T20:26:04Z,2023-09-13T21:06:25Z,2023-09-13T21:06:25Z,CONTRIBUTOR,simonw/datasette/pulls/2183,"Part of #2093

In #2149 , we ported over `""settings.json""` into the new `datasette.yaml` config file, with a top-level `""settings""` key. This PR ports over plugin configuration into top-level `""plugins""` key, as well as nested database/table plugin config.

From now on, no plugin-related configuration is allowed in `metadata.yaml`, and must be in `datasette.yaml` in this new format. This is a pretty significant breaking change. Thankfully, you should be able to copy-paste your legacy plugin key/values into the new `datasette.yaml` format.

An example of what `datasette.yaml` would look like with this new plugin config:

```yaml

plugins:
  datasette-my-plugin:
    config_key: value

databases:
  fixtures:
      plugins: 
        datasette-my-plugin:
          config_key: fixtures-db-value
    tables:
      students:
        plugins:
          datasette-my-plugin:
            config_key: fixtures-students-table-value

```

As an additional benefit, this now works with the new `-s` flag:

```bash
datasette --memory -s 'plugins.datasette-my-plugin.config_key' new_value
```


Marked as a ""Draft"" right now until I add better documentation. We also should have a plan for the next alpha release to document and publicize this change, especially for plugin authors (since their docs will have to change to say `datasette.yaml` instead of `metadata.yaml`

<!-- readthedocs-preview datasette start -->
----
:books: Documentation preview :books:: https://datasette--2183.org.readthedocs.build/en/2183/

<!-- readthedocs-preview datasette end -->",107914493,datasette,pull,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2183/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1781530343,I_kwDOBm6k_c5qL_7n,2093,"Proposal: Combine settings, metadata, static, etc. into a single `datasette.yaml` File",15178711,asg017,open,0,,,,,8,2023-06-29T21:18:23Z,2023-09-11T20:19:32Z,,CONTRIBUTOR,,"Very often I get tripped up when trying to configure my Datasette instances. For example: if I want to change the port my app listen too, do I do that with a CLI flag, a `--setting` flag, inside `metadata.json`, or an env var? If I want to up the time limit of SQL statements, is that under `metadata.json` or a setting? Where does my plugin configuration go?

Normally I need to look it up in Datasette docs, and I quickly find my answer, but the number of places where ""config"" goes it overwhelming.

- Flat CLI flags like `--port`, `--host`, `--cors`, etc.
- `--setting`, like `default_page_size`, `sql_time_limit_ms` etc
- Inside `metadata.json`, including plugin configuration

Typically my Datasette deploys are extremely long shell commands, with multiple `--setting` and other CLI flags.

## Proposal: Consolidate all ""config"" into `datasette.toml`

I propose that we add a new `datasette.toml` that combines ""settings"", ""metadata"", and other common CLI flags like `--port` and `--cors` into a single file. It would be similar to ""Cargo.toml"" in Rust projects, ""package.json"" in Node projects, and ""pyproject.toml"" in Python, etc.

A sample of what it could look like:

```toml
# ""top level"" configuration that are currently CLI flags on `datasette serve`
[config]
port = 8020
host = ""0.0.0.0""
cors = true

# replaces multiple `--setting` flags
[settings]
base_url = ""/app/datasette/""
default_allow_sql = true
sql_time_limit_ms = 3500

# replaces `metadata.json`.
# The contents of datasette-metadata.json could be defined in this file instead, but supporting separate files is nice (since those are easy to machine-generate)
[metadata]
include=""./datasette-metadata.json""

# plugin-specific 
[plugins]
[plugins.datasette-auth-github]
client_id = {env = ""DATASETTE_AUTH_GITHUB_CLIENT_ID""}
client_secret = {env = ""GITHUB_CLIENT_SECRET""}

[plugins.datasette-cluster-map]

latitude_column = ""lat""
longitude_column = ""lon""
```

## Pros
- Instead of multiple files and CLI flags, everything could be in one tidy file
- Editing config in a separate file is easier than editing CLI flags, since you don't have to kill a process + edit a command every time
- New users will know ""just edit my `datasette.toml` instead of needing to learn metadata + settings + CLI flags
- Better dev experience for multiple environment. For example, could have `datasette -c datasette-dev.toml` for local dev environments (enables SQL, debug plugins, long timeouts, etc.), and a `datasette -c datasette-prod.toml` for ""production"" (lower timeouts, less plugins, monitoring plugins, etc.)

## Cons
- Yet another config-management system. Now Datasette users will need to know about metadata, settings, CLI flags, _and_ `datasette.toml`. However with enough documentation + announcements + examples, I think we can get ahead of it.
- If toml is chosen, would need to add a toml parser for Python version <3.11
- Multiple sources of config require priority. For example: Would `--setting default_allow_sql off` override the value inside `[settings]`? What about `--port`? 

## Other Notes

### Toml

I chose toml over json because toml supports comments. I chose toml over yaml because Python 3.11 has builtin support for it. I also find toml easier to work with since it doesn't have the odd ""gotchas"" that YAML has (""ex `3.10` resolving to `3.1`, Norway `NO` resolving to `false`, etc.). It also mimics `pyproject.toml` which is nice. Happy to change my mind about this however


### Plugin config will be difficult

Plugin config is currently in `metadata.json` in two places:

1. Top level, under `""plugins.[plugin-name]""`. This fits well into `datasette.toml` as `[plugins.plugin-name]`
2. Table level, under `""databases.[db-name].tables.[table-name].plugins.[plugin-name]`. This doesn't fit that well into `datasette.toml`, unless it's nested under `[metadata]`?

### Extensions, static, one-off plugins?

We could also include equivalents of `--plugins-dir`, `--static`, and `--load-extension` into `datasette.toml`, but I'd imagine there's a few security concerns there to think through. 


### Explicitly list with plugins to use?

I believe Datasette by default will load all install plugins on startup, but maybe `datasette.toml` can specify a list of plugins to use? For example, a dev version of `datasette.toml` can specify `datasette-pretty-traces`, but the prod version can leave it out",107914493,datasette,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2093/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1884330740,PR_kwDOBm6k_c5ZszDF,2174,Use $DATASETTE_INTERNAL in absence of --internal,15178711,asg017,open,0,,,,,3,2023-09-06T16:07:15Z,2023-09-08T00:46:13Z,,CONTRIBUTOR,simonw/datasette/pulls/2174,"#refs 2157, specifically [this comment](https://github.com/simonw/datasette/issues/2157#issuecomment-1700291967)

Passing in `--internal my_internal.db` over and over again can get repetitive. 

This PR adds a new configurable env variable `DATASETTE_INTERNAL_DB_PATH`. If it's defined, then it takes place as the path to the internal database. Users can still overwrite this behavior by passing in their own `--internal internal.db` flag.

In draft mode for now, needs tests and documentation. 

Side note: Maybe we can have a sections in the docs that lists all the ""configuration environment variables"" that Datasette respects? I did a quick grep and found:

- `DATASETTE_LOAD_PLUGINS`
- `DATASETTE_SECRETS`


<!-- readthedocs-preview datasette start -->
----
:books: Documentation preview :books:: https://datasette--2174.org.readthedocs.build/en/2174/

<!-- readthedocs-preview datasette end -->",107914493,datasette,pull,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2174/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1865869205,I_kwDOBm6k_c5vNueV,2157,"Proposal: Make the `_internal` database persistent, customizable, and hidden",15178711,asg017,open,0,,,,,3,2023-08-24T20:54:29Z,2023-08-31T02:45:56Z,,CONTRIBUTOR,,"The current `_internal` database is used by Datasette core to cache info about databases/tables/columns/foreign keys of databases in a Datasette instance. It's a temporary database created at startup, that can only be seen by the root user. See an [example `_internal` DB here](https://latest.datasette.io/_internal), after [logging in as root](https://latest.datasette.io/login-as-root).

The current `_internal` database has a few rough edges:

- It's part of `datasette.databases`, so many plugins have to specifically exclude `_internal` from their queries [examples here](https://github.com/search?q=datasette+hookimpl+%22_internal%22+language%3APython+-path%3Adatasette%2F&ref=opensearch&type=code)
- It's only used by Datasette core and can't be used by plugins or 3rd parties
- It's created from scratch at startup and stored in memory. Why is fine, the performance is great, but persistent storage would be nice.

Additionally, it would be really nice if plugins could use this `_internal` database to store their own configuration, secrets, and settings. For example:

- `datasette-auth-tokens` [creates a `_datasette_auth_tokens` table](https://github.com/simonw/datasette-auth-tokens/blob/main/datasette_auth_tokens/__init__.py#L15) to store auth token metadata. This could be moved into the `_internal` database to avoid writing to the gues database
- `datasette-socrata` [creates a `socrata_imports`](https://github.com/simonw/datasette-socrata/blob/1409aa9b4d2fc3aff286b52e73af33b5786d56d0/datasette_socrata/__init__.py#L190-L198) table, which also can be in `_internal`
- `datasette-upload-csvs` [creates a `_csv_progress_`](https://github.com/simonw/datasette-upload-csvs/blob/main/datasette_upload_csvs/__init__.py#L154) table, which can be in `_internal`
- `datasette-write-ui` wants to have the ability for users to toggle whether a table appears editable, which can be either in `datasette.yaml` or on-the-fly by storing config in `_internal`


In general, these are specific features that Datasette plugins would have access to if there was a central internal database they could read/write to:

- **Dynamic configuration**. Changing the `datasette.yaml` file works, but can be tedious to restart the server every time. Plugins can define their own configuration table in `_internal`, and could read/write to it to store configuration based on user actions (cell menu click, API access, etc.)
- **Caching**. If a plugin or Datasette Core needs to cache some expensive computation, they can store it inside `_internal` (possibly as a temporary table) instead of managing their own caching solution.
- **Audit logs**. If a plugin performs some sensitive operations, they can log usage info to `_internal` for others to audit later. 
- **Long running process status**. Many plugins (`datasette-upload-csvs`, `datasette-litestream`, `datasette-socrata`) perform tasks that run for a really long time, and want to give continue status updates to the user. They can store this info inside` _internal`
- **Safer authentication**. Passwords and authentication plugins usually store credentials/hashed secrets in configuration files or environment variables, which can be difficult to handle. Now, they can store them in `_internal` 

## Proposal

- We remove `_internal` from [`datasette.databases`](https://docs.datasette.io/en/latest/internals.html#databases) property.
- We add new `datasette.get_internal_db()` method that returns the `_internal` database, for plugins to use
- We add a new `--internal internal.db` flag. If provided, then the `_internal` DB will be sourced from that file, and further updates will be persisted to that file (instead of an in-memory database)
- When creating internal.db, create a new `_datasette_internal` table to mark it a an ""datasette internal database""
- In `datasette serve`, we check for the existence of the `_datasette_internal` table. If it exists, we assume the user provided that file in error and raise an error. This is to limit the chance that someone accidentally publishes their internal database to the internet. We could optionally add a `--unsafe-allow-internal` flag (or database plugin) that allows someone to do this if they really want to.


## New features unlocked with this

These features don't really need a standardized `_internal` table per-say (plugins could currently configure their own long-time storage features if they really wanted to), but it would make it much simpler to create these kinds of features with a persistent application database.

- **`datasette-comments`** : A plugin for commenting on rows or specific values in a database. Comment contents + threads + email notification info can be stored in `_internal`
- **Bookmarks**: ""Bookmarking"" an SQL query could be stored in `_internal`, or a URL link shortener
- **Webhooks**: If a plugin wants to either consume a webhook or create a new one, they can store hashed credentials/API endpoints in `_internal`",107914493,datasette,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2157/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1870672704,PR_kwDOBm6k_c5Y-7Em,2162,"Add new `--internal internal.db` option, deprecate legacy `_internal` database",15178711,asg017,closed,0,,,,,4,2023-08-29T00:05:07Z,2023-08-29T03:24:23Z,2023-08-29T03:24:23Z,CONTRIBUTOR,simonw/datasette/pulls/2162,"refs #2157 

This PR adds a new `--internal` option to datasette serve. If provided, it is the path to a persistent internal database that Datasette core and Datasette plugins can use to store data, as discussed in the proposal issue. 

This PR also removes and deprecates the previous in-memory `_internal` database. Those tables now appear in the `internal` database, with `core_` prefixes (ex `tables` in `_internal` is now `core_tables` in `internal`).


## A note on the new `core_` tables
However, one important notes about those new `core_` tables: If a `--internal` DB is passed in, that means those `core_` tables will persist across multiple Datasette instances. This wasn't the case before, since `_internal` was always an in-memory database created from scratch.

I tried to put those `core_` tables as `TEMP` tables - after all, there's always one 1 `internal` DB connection at a time, so I figured it would work. But, since we use the `Database()` wrapper for the internal DB, it has two separate connections: a default read-only connection and a write connection that is created when a write operation occurs. Which meant the `TEMP` tables would be created by the write connection, but not available in the read-only connection. 

So I had a brillant idea: Attach an in-memory named database with `cache=shared`, and create those tables there! 

```sql
ATTACH DATABASE 'file:datasette_internal_core?mode=memory&cache=shared' AS core;
```

We'd run this on both the read-only connection and the write-only connection. That way, those tables would stay in memory, they'd communicate with the `cache=shared` feature, and we'd be good to go.


However, I couldn't find an easy way to run a `ATTACH DATABASE` command on the read-only query. 

Using `Database()` as a wrapper for the internal DB is pretty limiting - it's meant for Datasette ""data"" databases, where we want multiple readers and possibly 1 write connection at a time. But the internal database doesn't really require that kind of support - I think we could get away with a single read/write connection, but it seemed like too big of a rabbithole to go through now. 


<!-- readthedocs-preview datasette start -->
----
:books: Documentation preview :books:: https://datasette--2162.org.readthedocs.build/en/2162/

<!-- readthedocs-preview datasette end -->",107914493,datasette,pull,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2162/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1855885427,I_kwDOBm6k_c5unpBz,2143,De-tangling Metadata before Datasette 1.0,15178711,asg017,open,0,,,,,24,2023-08-18T00:51:50Z,2023-08-24T18:28:27Z,,CONTRIBUTOR,,"Metadata in Datasette is a really powerful feature, but is a bit difficult to work with. It was initially a way to add ""metadata"" about your ""data"" in Datasette instances, like descriptions for databases/tables/columns, titles, source URLs, licenses, etc. But it later became the go-to spot for other Datasette features that have nothing to do with metadata, like permissions/plugins/canned queries. 

Specifically, I've found the following problems when working with Datasette metadata:

1. Metadata cannot be updated without re-starting the entire Datasette instance.
2. The `metadata.json`/`metadata.yaml` has become a kitchen sink of unrelated (imo) features like plugin config, authentication config, canned queries
3. The Python APIs for defining extra metadata are a bit awkward (the `datasette.metadata()` class, `get_metadata()` hook, etc.)

## Possible solutions

Here's a few ideas of Datasette core changes we can make to address these problems. 

### Re-vamp the Datasette Python metadata APIs

The Datasette object has a single `datasette.metadata()` method that's a bit difficult to work with. There's also no Python API for inserted new metadata, so plugins have to rely on the `get_metadata()` hook.

The `get_metadata()` hook can also be improved - it doesn't work with async functions yet, so you're quite limited to what you can do.

(I'm a bit fuzzy on what to actually do here, but I imagine it'll be very small breaking changes to a few Python methods)

### Add an optional `datasette_metadata` table

Datasette should detect and use metadata stored in a new special table called `datasette_metadata`. This would be a regular table that a user can edit on their own, and would serve as a ""live updating"" source of metadata, than can be changed while the Datasette instance is running.

Not too sure what the schema would look like, but I'd imagine:

```sql
CREATE TABLE datasette_metadata(
  level text,
  target any,
  key text,
  value any,
  primary key (level, target)
)
```

Every row in this table would map to a single metadata ""entry"".

- `level` would be one of ""datasette"", ""database"", ""table"", ""column"", which is the ""level"" the entry describes. For example, `level=""table""` means it is metadata about a specific table, `level=""database""` for a specific database, or `level=""datasette""` for the entire Datasette instance.
- `target` would ""point"" to the specific object the entry metadata is about, and would depend on what `level` is specific. 
  - `level=""database""`: `target` would be the string name of the database that the metadata entry is about. ex `""fixtures""`
  - `level=""table""`: `target` would be a JSON array of two strings. The first element would be the database name, and the second would be the table name. ex `[""fixtures"", ""students""]`
  - `level=""column""`: `target` would be a JSON array of 3 strings: The database name, table name, and column name. Ex `[""fixtures"", ""students"", ""student_id""`]
- `key` would be the type of metadata entry the row has, similar to the current ""keys"" that exist in `metadata.json`. Ex `""about_url""`, `""source""`, `""description""`, etc
- `value` would be the text value of be metadata entry. The literal text value of a description, about_url, column_label, etc

A quick sample:

level | target | key | value
-- | -- | -- | --
datasette | NULL | title | my datasette title...
db | fixtures | source | <description of my database source>
table | [""fixtures"", ""students""] | label_column | student_name
column | [""fixtures"", ""students"", ""birthdate""] | description | <description of the fixtures.students.birthdate column>

This `datasette_metadata` would be configured with other tools, and hopefully not manually by end users. Datasette Core could also offer a UI for editing entries in `datasette_metadata`, to update descriptions/columns on the fly.

### Re-vamp `metadata.json` and move non-metadata config to another place

The motivation behind this is that it's awkward that `metadata.json` contains config about things that are not strictly metadata, including:

- Plugin configuration
- [Authentication/permissions](https://docs.datasette.io/en/latest/authentication.html#access-permissions-in-metadata) (ex the `allow` key on datasettes/databases/tables
- Canned queries. might be controversial, but in my mind, canned queries are application-specific code and configuration, and don't describe the data that exists in SQLite databases. 

I think we should move these outside of `metadata.json` and into a different file. The `datasette.json` idea in  #2093 may be a good solution here: plugin/permissions/canned queries can be defined in `datasette.json`, while `metadata.json`/`datasette_metadata` will strictly be about documenting databases/tables/columns. 
",107914493,datasette,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2143/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1864112887,PR_kwDOBm6k_c5Yo7bk,2151,Test Datasette on multiple SQLite versions,15178711,asg017,open,0,,,,,1,2023-08-23T22:42:51Z,2023-08-23T22:58:13Z,,CONTRIBUTOR,simonw/datasette/pulls/2151,"still testing, hope it works!

<!-- readthedocs-preview datasette start -->
----
:books: Documentation preview :books:: https://datasette--2151.org.readthedocs.build/en/2151/

<!-- readthedocs-preview datasette end -->",107914493,datasette,pull,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2151/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1,
1861812208,PR_kwDOBm6k_c5YhH-W,2149,"Start a new `datasette.yaml` configuration file, with settings support",15178711,asg017,closed,0,,,,,2,2023-08-22T16:24:16Z,2023-08-23T01:26:11Z,2023-08-23T01:26:11Z,CONTRIBUTOR,simonw/datasette/pulls/2149,"refs #2093 #2143 

This is the first step to implementing the new `datasette.yaml`/`datasette.json` configuration file. 

- The old `--config` argument is now back, and is the path to a `datasette.yaml` file. Acts like the `--metadata` flag.
- The old `settings.json` behavior has been removed.
- The `""settings""` key inside `datasette.yaml` defines the same `--settings` flags
- Values passed in `--settings` will over-write values in `datasette.yaml`

Docs for the Config file is pretty light, not much to add until we add more config to the file.

<!-- readthedocs-preview datasette start -->
----
:books: Documentation preview :books:: https://datasette--2149.org.readthedocs.build/en/2149/

<!-- readthedocs-preview datasette end -->",107914493,datasette,pull,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2149/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1821108702,I_kwDOCGYnMM5si-ne,579,Special handling for SQLite column of type `JSON`,15178711,asg017,open,0,,,,,0,2023-07-25T20:37:23Z,2023-07-25T20:37:23Z,,CONTRIBUTOR,,"`sqlite-utils` should detect and have specially handling for column with a `JSON` column. For example:

```sql
CREATE TABLE ""dogs"" (
  id INTEGER PRIMARY KEY,
  name TEXT,
  friends JSON 
);
```

## Automatic Nesting

According to [""Nested JSON Values""](https://sqlite-utils.datasette.io/en/stable/cli.html#nested-json-values), sqlite-utils will only expand JSON if the `--json-cols` flag is passed. It looks like it'll try to `json.load` all text column to test if its JSON, which can get expensive on non-json columns. 

Instead, `sqlite-utils` should be default (ie without the `--json-cols` flags) do the `maybe_json()` operation on columns with a declared `JSON` type. So the above table would expand the `""friends""` column as expected, withoutthe `--json-cols` flag:

```bash
sqlite-utils dogs.db ""select * from dogs"" | python -mjson.tool
```

```
[
    {
        ""id"": 1,
        ""name"": ""Cleo"",
        ""friends"": [
            {
                ""name"": ""Pancakes""
            },
            {
                ""name"": ""Bailey""
            }
        ]
    }
]
```

---

I'm sure there's other ways `sqlite-utils` can specially handle JSON columns, so keeping this open while I think of more",140912432,sqlite-utils,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/579/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1801394744,I_kwDOCGYnMM5rXxo4,567,Plugin system,15178711,asg017,closed,0,,,,,9,2023-07-12T17:02:14Z,2023-07-22T22:59:37Z,2023-07-22T22:59:36Z,CONTRIBUTOR,,"I'd like there to be a plugin system for sqlite-utils, similar to the datasette/llm plugins. I'd like to make plugins that would do things like:

- Register SQLite extensions for more SQL functions + virtual tables
- Register new subcommands
- Different input file formats for `sqlite-utils memory`
- Different output file formats (in addition to `--csv` `--tsv` `--nl` etc.

A few real-world use-cases of plugins I'd like to see in sqlite-utils:

- Register many of my sqlite extensions in sqlite-utils (`sqlite-http`, `sqlite-lines`, `sqlite-regex`, etc.)
- New subcommands to work with `sqlite-vss` vector tables
- Input/ouput Parquet/Avro/Arrow IPC files with `sqlite-arrow`",140912432,sqlite-utils,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/567/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
1816917522,PR_kwDOCGYnMM5WJ6Jm,573,feat: Implement a prepare_connection plugin hook,15178711,asg017,closed,0,,,,,4,2023-07-22T22:48:44Z,2023-07-22T22:59:09Z,2023-07-22T22:59:09Z,CONTRIBUTOR,simonw/sqlite-utils/pulls/573,"Just like the [Datasette prepare_connection hook](https://docs.datasette.io/en/stable/plugin_hooks.html#prepare-connection-conn-database-datasette), this PR adds a similar hook for the `sqlite-utils` plugin system. 

The sole argument is `conn`, since I don't believe a `database` or `datasette` argument would be relevant here. 

I want to do this so I can release `sqlite-utils` plugins for my [SQLite extensions](https://github.com/asg017/sqlite-ecosystem), similar to the Datasette plugins I've release for them. 

An example plugin: https://gist.github.com/asg017/d7cdf0d56e2be87efda28cebee27fa3c

```bash
$ sqlite-utils install https://gist.github.com/asg017/d7cdf0d56e2be87efda28cebee27fa3c/archive/5f5ad549a40860787629c69ca120a08c32519e99.zip

$ sqlite-utils memory 'select hello(""alex"") as response'
[{""response"": ""Hello, alex!""}]
```
Refs:
- #574 

<!-- readthedocs-preview sqlite-utils start -->
----
:books: Documentation preview :books:: https://sqlite-utils--573.org.readthedocs.build/en/573/

<!-- readthedocs-preview sqlite-utils end -->",140912432,sqlite-utils,pull,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/573/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1783304750,I_kwDOBm6k_c5qSxIu,2094,JS Plugin Hooks for the Code Editor,15178711,asg017,open,0,,,,,0,2023-07-01T00:51:57Z,2023-07-01T00:51:57Z,,CONTRIBUTOR,,"When #2052 merges, I'd like to add support to add extensions/functions to the Datasette code editor. 

I'd eventually like to build a JS plugin for [`sqlite-docs`](https://github.com/asg017/sqlite-docs), to add things like:

- Inline documentation for tables/columns on hover
- Inline docs for custom functions that are loaded in
- More detailed autocomplete for tables/columns/functions

I did some hacking to see what this would look like, see here:

<img width=""1223"" alt=""image"" src=""https://github.com/simonw/datasette/assets/15178711/64f95cbc-1492-4365-896f-b88c6d08a649"">
<img width=""1223"" alt=""image"" src=""https://github.com/simonw/datasette/assets/15178711/73e602ba-5f45-417a-997e-5aea1738527a"">

There can be a new hook that allows JS plugins to add new ""extension"" in the CodeMirror editorview here:

https://github.com/simonw/datasette/blob/8cd60fd1d899952f1153460469b3175465f33f80/datasette/static/cm-editor-6.0.1.js#L25

Will need some more planning. For example, the Codemirror bundle in Datasette has functions that we could re-export for plugins to use (so we don't load 2 version of `""@codemirror/autocomplete""`, for example. ",107914493,datasette,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2094/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1620515757,I_kwDOBm6k_c5glxut,2039,Subtle bug with `--load-extension` and `--static` flags with absolute Windows paths with`C:\`,15178711,asg017,open,0,,,,,0,2023-03-12T21:18:52Z,2023-03-12T21:18:52Z,,CONTRIBUTOR,,"From the Datasette discord: A user tried running the following command on windows:

```
datasette  --load-extension=""C:\spatialite\mod_spatialite-5.0.1-win-x86\mod_spatialite.dll""
```
This failed with `""The specified module could not be found""`, because the entrypoint option introduced in #1789 splits the input differently. Instead of loading the extension found at `""C:\spatialite\mod_spatialite-5.0.1-win-x86\mod_spatialite.dll""`, it instead tried to load the extension at `""C""` with entrypoint `""\spatialite\mod_spatialite-5.0.1-win-x86\mod_spatialite.dll"". 

This is hard because most absolute windows paths have a colon in them, like `C:\foo.txt` or `D:\bar.txt`.  I'd image the `--static` flag is also vulnerable to this type of bug.


The ""solution"" is to use a relative path instead, but that doesn't feel that great. ",107914493,datasette,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/2039/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1344823170,PR_kwDOBm6k_c49e3_k,1789,Add new entrypoint option to `--load-extension`,15178711,asg017,closed,0,,,,,9,2022-08-19T19:27:47Z,2022-08-23T18:42:52Z,2022-08-23T18:34:30Z,CONTRIBUTOR,simonw/datasette/pulls/1789,"Closes #1784 

The `--load-extension` flag can now accept an optional ""entrypoint"" value, to specify which entrypoint SQLite should load from the given extension. 

```bash
# would load default entrypoint like before
datasette data.db --load-extension ext

# loads the extensions with the ""sqlite3_foo_init"" entrpoint
datasette data.db --load-extension ext:sqlite3_foo_init

# loads the extensions with the ""sqlite3_bar_init"" entrpoint
datasette data.db --load-extension ext:sqlite3_bar_init
```

For testing, I added a small SQLite extension in C at `tests/ext.c`. If compiled, then pytest will run the unit tests in `test_load_extensions.py`to verify that Datasette loads in extensions correctly (and loads the correct entrypoints). Compiling the extension requires a C compiler, I compiled it on my Mac with:

```
gcc ext.c -I path/to/sqlite -fPIC -shared -o ext.dylib
```

Where `path/to/sqlite` is a directory that contains the SQLite amalgamation header files.

Re documentation: I added a bit to the help text for `--load-extension` (which I believe should auto-add to documentation?), and the existing extension documentation is spatialite specific. Let me know if a new extensions documentation page would be helpful!",107914493,datasette,pull,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/1789/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1339663518,I_kwDOBm6k_c5P2aSe,1784,"Include ""entrypoint"" option on `--load-extension`?",15178711,asg017,closed,0,,,,,2,2022-08-16T00:22:57Z,2022-08-23T18:34:31Z,2022-08-23T18:34:31Z,CONTRIBUTOR,,"## Problem 

SQLite extensions have the option to define multiple ""entrypoints"" in each loadable extension. For example, the upcoming version of `sqlite-lines` will have 2 entrypoints: the default `sqlite3_lines_init` (which SQLite will automatically guess for) and `sqlite3_lines_noread_init`. The `sqlite3_lines_noread_init` version omits functions that read from the filesystem, which is necessary for security purposes when running untrusted SQL (which Datasette does).

(Similar multiple entrypoints will also be added for sqlite-http).



The `--load-extension` flag, however, doesn't give the option to specify a different entrypoint, so the default one is always used. 

## Proposal

I want there to be a new command line option of the `--load-extension` flag to specify a custom entrypoint like so:
```
datasette my.db \
  --load-extension ./lines0 sqlite3_lines0_noread_init
```

Then, under the hood, this line of code:

https://github.com/simonw/datasette/blob/7af67b54b7d9bca43e948510fc62f6db2b748fa8/datasette/app.py#L562

Would look something like this:

```python
 conn.execute(""SELECT load_extension(?, ?)"", [extension, entrypoint]) 
```

One potential problem: For backward compatibility, I'm not sure if Click allows cli flags to have variable number of options (""arity""). So I guess it could also use a `:` delimiter like `--static`:

```
datasette my.db \
  --load-extension ./lines0:sqlite3_lines0_noread_init
```

Or maybe even a new flag name?

```
datasette my.db \
  --load-extension-entrypoint ./lines0 sqlite3_lines0_noread_init
```


Personally I prefer the `:` option... and maybe even `--load-extension` -> `--load`? Definitely out of scope for this issue tho

```
datasette my.db \
  --load./lines0:sqlite3_lines0_noread_init
```",107914493,datasette,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/1784/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
1060631257,I_kwDOBm6k_c4_N_LZ,1528,"Add new `""sql_file""` key to Canned Queries in metadata?",15178711,asg017,open,0,,,,,3,2021-11-22T21:58:01Z,2022-06-10T03:23:08Z,,CONTRIBUTOR,,"Currently for canned queries, you have to inline SQL in your `metadata.yaml` like so:

```yaml
databases:
  fixtures:
    queries:
      neighborhood_search:
        sql: |-
          select neighborhood, facet_cities.name, state
          from facetable
            join facet_cities on facetable.city_id = facet_cities.id
          where neighborhood like '%' || :text || '%'
          order by neighborhood
        title: Search neighborhoods
```

This works fine, but for a few reasons, I usually have my canned queries already written in separate `.sql` files. I'd like to instead re-use those instead of re-writing it. 

So, I'd like to see a new `""sql_file""` key that works like so:

`metadata.yaml`:

```yaml
databases:
  fixtures:
    queries:
      neighborhood_search:
        sql_file: neighborhood_search.sql
        title: Search neighborhoods
```
`neighborhood_search.sql`:
```sql
select neighborhood, facet_cities.name, state
from facetable
join facet_cities on facetable.city_id = facet_cities.id
where neighborhood like '%' || :text || '%'
order by neighborhood
```

Both of these would work in the exact same way, where Datasette would instead open + include `neighborhood_search.sql` on startup. 


A few reasons why I'd like to keep my canned queries SQL separate from metadata.yaml:

- Keeping SQL in standalone SQL files means syntax highlighting and other text editor integrations in my code
- Multiline strings in yaml, while functional, are a tad cumbersome and are hard to edit
- Works well with other tools (can pipe `.sql` files into the `sqlite3` CLI, or use with other SQLite clients easier)
- Typically my canned queries are quite long compared to everything else in my metadata.yaml, so I'd love to separate it where possible

Let me know if this is a feature you'd like to see, I can try to send up a PR if this sounds right!",107914493,datasette,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/1528/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
751195017,MDU6SXNzdWU3NTExOTUwMTc=,1111,Accessing a database's `.json` is slow for very large SQLite files,15178711,asg017,open,0,,,,,3,2020-11-26T00:27:27Z,2021-01-04T19:57:53Z,,CONTRIBUTOR,,"I have a SQLite DB that's pretty large, 23GB and something like 300 million rows. I expect that most queries I run on it will be slow, which is fine, but there are some things that Datasette does that makes working with the DB very slow. Specifically, when I access the `.json` metadata for a table (which I believe it comes from `datasette/views/database.py`, it takes 43 seconds for the request to come in:

```bash
$ time curl localhost:9999/out.json
{""database"": ""out"", ""size"": 24291454976, ""tables"": [{""name"": ""PageviewsHour"", ""columns"": [""file"", ""code"", ""page"", ""pageviews""], ""primary_keys"": [], ""count"": null, ""hidden"": false, ""fts_table"": null, ""foreign_keys"": {""incoming"": [], ""outgoing"": [{""other_table"": ""PageviewsHourFiles"", ""column"": ""file"", ""other_column"": ""file_id""}]}, ""private"": false}, {""name"": ""PageviewsHourFiles"", ""columns"": [""file_id"", ""filename"", ""sha256"", ""size"", ""day"", ""hour""], ""primary_keys"": [""file_id""], ""count"": null, ""hidden"": false, ""fts_table"": null, ""foreign_keys"": {""incoming"": [{""other_table"": ""PageviewsHour"", ""column"": ""file_id"", ""other_column"": ""file""}], ""outgoing"": []}, ""private"": false}, {""name"": ""sqlite_sequence"", ""columns"": [""name"", ""seq""], ""primary_keys"": [], ""count"": 1, ""hidden"": false, ""fts_table"": null, ""foreign_keys"": {""incoming"": [], ""outgoing"": []}, ""private"": false}], ""hidden_count"": 0, ""views"": [], ""queries"": [], ""private"": false, ""allow_execute_sql"": true, ""query_ms"": 43340.23213386536}
real	0m43.417s
user	0m0.006s
sys	0m0.016s
```

I suspect this is because a `COUNT(*)` is happening under the hood, which, when I run it through sqlite directly, does take around the same time:

```bash
$ time sqlite3 out.db < <(echo ""select count(*) from PageviewsHour;"")
362794272

real	0m44.523s
user	0m2.497s
sys	0m6.703s
```

I'm using the `.json` request in the [Observable Datasette Client](https://observablehq.com/@asg017/datasette-client) to 1) verify that a link passed in is a reachable Datasette instance, and 2) a quick way to look at metadata for a db. A few different solutions I can think of:

1. Have some other endpoint, like `/-/datasette.json` that the Observable Datasette client can fetch from to verify that the passed in URL is a valid Datasette (doesnt solve the slow problem, feel free to split this issue into 2)
2. Have a way to turn off table counts when accessing a database's `.json` view, like `?no_count=1` or something
3. Maybe have a timeout on the `table_counts()` function if it takes too long. which is odd, because it seems like it already does that (I think?), I can debug a little more if that's the case

More than happy to debug further, or send a PR if you like one of the proposals above!",107914493,datasette,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/1111/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,