html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app
https://github.com/simonw/datasette/issues/526#issuecomment-1254064260,https://api.github.com/repos/simonw/datasette/issues/526,1254064260,IC_kwDOBm6k_c5Kv4CE,536941,2022-09-21T18:17:04Z,2022-09-21T18:18:01Z,CONTRIBUTOR,"hi @simonw, this is becoming more of a bother for my [labor data warehouse](https://labordata.bunkum.us/). Is there any research or a spike i could do that would help you investigate this issue?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",459882902,
https://github.com/simonw/datasette/issues/1836#issuecomment-1272357976,https://api.github.com/repos/simonw/datasette/issues/1836,1272357976,IC_kwDOBm6k_c5L1qRY,536941,2022-10-08T16:56:51Z,2022-10-08T16:56:51Z,CONTRIBUTOR,"when you are running from docker, you **always** will want to run as `mode=ro` because the same thing that is causing duplication in the inspect layer will cause duplication in the final container read/write layer when `datasette serve` runs.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1400374908,
https://github.com/simonw/datasette/issues/526#issuecomment-1258337011,https://api.github.com/repos/simonw/datasette/issues/526,1258337011,IC_kwDOBm6k_c5LALLz,536941,2022-09-26T16:49:48Z,2022-09-26T16:49:48Z,CONTRIBUTOR,"i think the smallest change that gets close to what i want is to change the behavior so that `max_returned_rows` is not applied in the `execute` method when we are are asking for a csv of query.

there are some infelicities for that approach, but i'll make a PR to make it easier to discuss.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",459882902,
https://github.com/simonw/datasette/pull/1820#issuecomment-1258803261,https://api.github.com/repos/simonw/datasette/issues/1820,1258803261,IC_kwDOBm6k_c5LB9A9,536941,2022-09-27T00:03:09Z,2022-09-27T00:03:09Z,CONTRIBUTOR,"the pattern in this PR `max_returned_rows` control the maximum rows rendered through html and json, and the csv render bypasses that.

i think it would be better to have each of these different query renderers have more direct control for how many rows to fetch, instead of relying on the internals of the `execute` method.

generally, users will not want to paginate through tens of thousands of results, but often will want to download a full query as json or as csv. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1386456717,
https://github.com/simonw/datasette/issues/526#issuecomment-1258849766,https://api.github.com/repos/simonw/datasette/issues/526,1258849766,IC_kwDOBm6k_c5LCIXm,536941,2022-09-27T01:27:03Z,2022-09-27T01:27:03Z,CONTRIBUTOR,"i agree with that concern! but if i'm understanding the code correctly, `maximum_returned_rows` does not protect against long-running queries in any way.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",459882902,
https://github.com/simonw/datasette/issues/526#issuecomment-1258871525,https://api.github.com/repos/simonw/datasette/issues/526,1258871525,IC_kwDOBm6k_c5LCNrl,536941,2022-09-27T02:09:32Z,2022-09-27T02:14:53Z,CONTRIBUTOR,"thanks @simonw, i learned something i didn't know about sqlite's execution model!

> Imagine if Datasette CSVs did allow unlimited retrievals. Someone could hit the CSV endpoint for that recursive query and tie up Datasette's SQL connection effectively forever.

why wouldn't the `sqlite_timelimit` guard prevent that?

--- 
on my local version which has the code to [turn off truncations for query csv](#1820), `sqlite_timelimit` does protect me.

![Screenshot 2022-09-26 at 22-14-31 Error 500](https://user-images.githubusercontent.com/536941/192415680-94b32b7f-868f-4b89-8194-5752d45f6009.png)
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",459882902,
https://github.com/simonw/datasette/issues/526#issuecomment-1258878311,https://api.github.com/repos/simonw/datasette/issues/526,1258878311,IC_kwDOBm6k_c5LCPVn,536941,2022-09-27T02:19:48Z,2022-09-27T02:19:48Z,CONTRIBUTOR,"this sql query doesn't trip up `maximum_returned_rows` but does timeout

```sql
with recursive counter(x) as (
   select 0
     union
   select x + 1 from counter
 )
 select * from counter LIMIT 10 OFFSET 100000000 
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",459882902,
https://github.com/simonw/datasette/issues/526#issuecomment-1258910228,https://api.github.com/repos/simonw/datasette/issues/526,1258910228,IC_kwDOBm6k_c5LCXIU,536941,2022-09-27T03:11:07Z,2022-09-27T03:11:07Z,CONTRIBUTOR,"i think this feature would be safe, as its really only the time limit that can, and imo, should protect against long running queries, as it is pretty easy to make very expensive queries that don't return many rows.

moving away from `max_returned_rows` will requires some thinking about:

1. memory usage and data flows to handle potentially very large result sets
2. how to avoid rendering tens or hundreds of thousands of [html rows](#1655).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",459882902,
https://github.com/simonw/datasette/issues/526#issuecomment-1259718517,https://api.github.com/repos/simonw/datasette/issues/526,1259718517,IC_kwDOBm6k_c5LFcd1,536941,2022-09-27T16:02:51Z,2022-09-27T16:04:46Z,CONTRIBUTOR,"i think that `max_returned_rows` **is** a defense mechanism, just not for connection exhaustion. `max_returned_rows` is a defense mechanism against **memory bombs**.

if you are potentially yielding out hundreds of thousands or even millions of rows, you need to be quite careful about data flow to not run out of memory on the server, or on the client.

you have a lot of places in your code that are protective of that right now, but `max_returned_rows` acts as the final backstop.

so, given that, it makes sense to have removing `max_returned_rows` altogether be a non-goal, but instead allow for for specific codepaths (like streaming csv's) be able to bypass.

that could dramatically lower the surface area for a memory-bomb attack.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",459882902,
https://github.com/simonw/datasette/issues/1062#issuecomment-1260909128,https://api.github.com/repos/simonw/datasette/issues/1062,1260909128,IC_kwDOBm6k_c5LJ_JI,536941,2022-09-28T13:22:53Z,2022-09-28T14:09:54Z,CONTRIBUTOR,"if you went this route:

```python
with sqlite_timelimit(conn, time_limit_ms):
     c.execute(query)
     for chunk in c.fetchmany(chunk_size):
         yield from chunk
```

then `time_limit_ms` would probably have to be greatly extended, because the time spent in the loop will depend on the downstream processing.

i wonder if this was why you were thinking this feature would need a dedicated connection?

---

reading more, there's no real limit i can find on the number of active cursors (or more precisely active prepared statements objects, because sqlite doesn't really have cursors). 

maybe something like this would be okay?

```python
with sqlite_timelimit(conn, time_limit_ms):
     c.execute(query)
     # step through at least one to evaluate the statement, not sure if this is necessary
     yield c.execute.fetchone()
for chunk in c.fetchmany(chunk_size):
    yield from chunk
```

this seems quite weird that there's not more of limit of the number of active prepared statements, but i haven't been able to find one.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",732674148,
https://github.com/simonw/datasette/issues/1062#issuecomment-1260829829,https://api.github.com/repos/simonw/datasette/issues/1062,1260829829,IC_kwDOBm6k_c5LJryF,536941,2022-09-28T12:27:19Z,2022-09-28T12:27:19Z,CONTRIBUTOR,"for teaching `register_output_renderer` to stream it seems like the two options are to

1. a [nested query technique ](https://github.com/simonw/datasette/issues/526#issuecomment-505162238)to paginate through
2. a fetching model that looks like something
```python
with sqlite_timelimit(conn, time_limit_ms):
     c.execute(query)
     for chunk in c.fetchmany(chunk_size):
         yield from chunk
```
currently `db.execute` is not a generator, so this would probably need a new method?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",732674148,
https://github.com/simonw/datasette/issues/1480#issuecomment-1268613335,https://api.github.com/repos/simonw/datasette/issues/1480,1268613335,IC_kwDOBm6k_c5LnYDX,536941,2022-10-05T15:45:49Z,2022-10-05T15:45:49Z,CONTRIBUTOR,"running into this as i continue to grow my labor data warehouse.

Here a CloudRun PM says the container size should **not** count against memory: https://stackoverflow.com/a/56570717","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1015646369,
https://github.com/simonw/datasette/issues/1480#issuecomment-1268629159,https://api.github.com/repos/simonw/datasette/issues/1480,1268629159,IC_kwDOBm6k_c5Lnb6n,536941,2022-10-05T16:00:55Z,2022-10-05T16:00:55Z,CONTRIBUTOR,"as a next step, i'll fetch the docker image from the google registry, and see what memory and disk usage looks like when i run it locally.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1015646369,
https://github.com/simonw/datasette/issues/1480#issuecomment-1269847461,https://api.github.com/repos/simonw/datasette/issues/1480,1269847461,IC_kwDOBm6k_c5LsFWl,536941,2022-10-06T11:21:49Z,2022-10-06T11:21:49Z,CONTRIBUTOR,"thanks @simonw, i'll spend a little more time trying to figure out why this isn't working on cloudrun, and then will flip over to fly if i can't.

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1015646369,
https://github.com/simonw/datasette/issues/1836#issuecomment-1271103097,https://api.github.com/repos/simonw/datasette/issues/1836,1271103097,IC_kwDOBm6k_c5Lw355,536941,2022-10-07T04:43:41Z,2022-10-07T04:43:41Z,CONTRIBUTOR,"@simonw, should i open up a new issue for investigating the differences between ""immutable=1"" and ""mode=ro"" and possibly switching to ""mode=ro"". Or would you like to keep that conversation in this issue?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1400374908,
https://github.com/simonw/datasette/issues/1836#issuecomment-1271100651,https://api.github.com/repos/simonw/datasette/issues/1836,1271100651,IC_kwDOBm6k_c5Lw3Tr,536941,2022-10-07T04:38:14Z,2022-10-07T04:38:14Z,CONTRIBUTOR,"> yes, and i also think that this is causing the apparent memory problems in #1480. when the container starts up, it will make some operation on the database in `immutable` mode which apparently makes some small change to the db file. if that's so, then the db files will be copied to the read/write layer which counts against cloudrun's memory allocation!
> 
> running a test of that now.

this completely addressed #1480 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1400374908,
https://github.com/simonw/datasette/issues/1480#issuecomment-1271101072,https://api.github.com/repos/simonw/datasette/issues/1480,1271101072,IC_kwDOBm6k_c5Lw3aQ,536941,2022-10-07T04:39:10Z,2022-10-07T04:39:10Z,CONTRIBUTOR,switching from `immutable=1` to `mode=ro` completely addressed this. see https://github.com/simonw/datasette/issues/1836#issuecomment-1271100651 for details.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1015646369,
https://github.com/simonw/datasette/issues/1836#issuecomment-1270923537,https://api.github.com/repos/simonw/datasette/issues/1836,1270923537,IC_kwDOBm6k_c5LwMER,536941,2022-10-07T00:46:08Z,2022-10-07T00:46:08Z,CONTRIBUTOR,"i thought it was maybe to do with reading through all the files, but that does not seem to be the case

if i make a little test file like:

```python
# test_read.py
import hashlib
import sys
import pathlib

HASH_BLOCK_SIZE = 1024 * 1024

def inspect_hash(path):
    """"""Calculate the hash of a database, efficiently.""""""
    m = hashlib.sha256()
    with path.open(""rb"") as fp:
        while True:
            data = fp.read(HASH_BLOCK_SIZE)
            if not data:
                break
            m.update(data)

    return m.hexdigest()

inspect_hash(pathlib.Path(sys.argv[1]))
```

then a line in the Dockerfile like

```docker
RUN python test_read.py nlrb.db && echo ""[]"" > /etc/inspect.json
```

just produes a layer of `3B`
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1400374908,
https://github.com/simonw/datasette/issues/1836#issuecomment-1270936982,https://api.github.com/repos/simonw/datasette/issues/1836,1270936982,IC_kwDOBm6k_c5LwPWW,536941,2022-10-07T00:52:41Z,2022-10-07T00:52:41Z,CONTRIBUTOR,"it's not that the inspect command is somehow changing the db files. if i set them to only read-only, the ""inspect"" layer still has the same very large size.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1400374908,
https://github.com/simonw/datasette/issues/1836#issuecomment-1270988081,https://api.github.com/repos/simonw/datasette/issues/1836,1270988081,IC_kwDOBm6k_c5Lwb0x,536941,2022-10-07T01:19:01Z,2022-10-07T01:27:35Z,CONTRIBUTOR,"okay, some progress!! running some sql against a database file causes that file to get duplicated even if it doesn't apparently change the file.

make a little test script like this:

```python
# test_sql.py
import sqlite3
import sys

db_name = sys.argv[1]
conn = sqlite3.connect(f'file:/app/{db_name}', uri=True)
cur = conn.cursor()
cur.execute('select count(*) from filing')
print(cur.fetchone())
```

then 

```docker
RUN python test_sql.py nlrb.db
```

produced a layer that's the same size as `nlrb.db`!!

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1400374908,
https://github.com/simonw/datasette/issues/1836#issuecomment-1270992795,https://api.github.com/repos/simonw/datasette/issues/1836,1270992795,IC_kwDOBm6k_c5Lwc-b,536941,2022-10-07T01:29:15Z,2022-10-07T01:50:14Z,CONTRIBUTOR,"fascinatingly, telling python to open sqlite in read only mode makes this layer have a size of 0

```python
# test_sql_ro.py
import sqlite3
import sys

db_name = sys.argv[1]
conn = sqlite3.connect(f'file:/app/{db_name}?mode=ro', uri=True)
cur = conn.cursor()
cur.execute('select count(*) from filing')
print(cur.fetchone())
```

that's quite weird because setting the file permissions to read only didn't do anything. (on reflection, that chmod isn't doing anything because the dockerfile commands are run as root)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1400374908,
https://github.com/simonw/datasette/issues/1836#issuecomment-1271003212,https://api.github.com/repos/simonw/datasette/issues/1836,1271003212,IC_kwDOBm6k_c5LwfhM,536941,2022-10-07T01:52:04Z,2022-10-07T01:52:04Z,CONTRIBUTOR,"and if we try immutable mode, which is how things are opened by `datasette inspect` we duplicate the files!!!

```python
# test_sql_immutable.py
import sqlite3
import sys

db_name = sys.argv[1]
conn = sqlite3.connect(f'file:/app/{db_name}?immutable=1', uri=True)
cur = conn.cursor()
cur.execute('select count(*) from filing')
print(cur.fetchone())
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1400374908,
https://github.com/simonw/datasette/issues/1836#issuecomment-1271008997,https://api.github.com/repos/simonw/datasette/issues/1836,1271008997,IC_kwDOBm6k_c5Lwg7l,536941,2022-10-07T02:00:37Z,2022-10-07T02:00:49Z,CONTRIBUTOR,"yes, and i also think that this is causing the apparent memory problems in #1480. when the container starts up, it will make some operation on the database in `immutable` mode which apparently makes some small change to the db file. if that's so, then the db files will be copied to the read/write layer which counts against cloudrun's memory allocation!

running a test of that now. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1400374908,
https://github.com/simonw/datasette/issues/1836#issuecomment-1271020193,https://api.github.com/repos/simonw/datasette/issues/1836,1271020193,IC_kwDOBm6k_c5Lwjqh,536941,2022-10-07T02:15:05Z,2022-10-07T02:21:08Z,CONTRIBUTOR,"when i hack the connect method to open non mutable files with ""mode=ro"" and not ""immutable=1"" https://github.com/simonw/datasette/blob/eff112498ecc499323c26612d707908831446d25/datasette/database.py#L79

then: 

```bash
870 B  RUN /bin/sh -c datasette inspect nlrb.db --inspect-file inspect-data.json
```

the `datasette inspect` layer is only the size of the json file!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1400374908,
https://github.com/simonw/datasette/issues/1301#issuecomment-1271035998,https://api.github.com/repos/simonw/datasette/issues/1301,1271035998,IC_kwDOBm6k_c5Lwnhe,536941,2022-10-07T02:38:04Z,2022-10-07T02:38:04Z,CONTRIBUTOR,the only mode that `publish cloudrun` supports right now is immutable,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",860722711,
https://github.com/simonw/datasette/pull/1870#issuecomment-1294237783,https://api.github.com/repos/simonw/datasette/issues/1870,1294237783,IC_kwDOBm6k_c5NJIBX,536941,2022-10-27T23:42:18Z,2022-10-27T23:42:18Z,CONTRIBUTOR,Relevant sqlite forum thread: https://www.sqlite.org/forum/forumpost/02f7bda329f41e30451472421cf9ce7f715b768ce3db02797db1768e47950d48,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1426379903,
https://github.com/simonw/datasette/pull/1870#issuecomment-1294285471,https://api.github.com/repos/simonw/datasette/issues/1870,1294285471,IC_kwDOBm6k_c5NJTqf,536941,2022-10-28T01:06:03Z,2022-10-28T01:06:03Z,CONTRIBUTOR,"as far as i can tell, [this is where the ""immutable"" argument is used](https://github.com/sqlite/sqlite/blob/c97bb14fab566f6fa8d967c8fd1e90f3702d5b73/src/pager.c#L4926-L4931) in sqlite:

```c
      pPager->noLock = sqlite3_uri_boolean(pPager->zFilename, ""nolock"", 0);
      if( (iDc & SQLITE_IOCAP_IMMUTABLE)!=0
       || sqlite3_uri_boolean(pPager->zFilename, ""immutable"", 0) ){
          vfsFlags |= SQLITE_OPEN_READONLY;
          goto act_like_temp_file;
      }
```

so it does set the read only flag, but then has a goto.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1426379903,
https://github.com/simonw/datasette/pull/1870#issuecomment-1295667649,https://api.github.com/repos/simonw/datasette/issues/1870,1295667649,IC_kwDOBm6k_c5NOlHB,536941,2022-10-29T00:52:43Z,2022-10-29T00:53:43Z,CONTRIBUTOR,"> Are you saying that I can build a container, but then when I run it and it does `datasette serve -i data.db ...` it will somehow modify the image, or create a new modified filesystem layer in the runtime environment, as a result of running that `serve` command?

Somehow, `datasette serve -i data.db` will lead to the `data.db` being modified, which will trigger a [copy-on-write](https://docs.docker.com/storage/storagedriver/#the-copy-on-write-cow-strategy) of `data.db` into the read-write layer of the container.

I don't understand **how** that happens.

it kind of feels like a bug in sqlite, but i can't quite follow the sqlite code.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1426379903,
https://github.com/simonw/datasette/issues/1890#issuecomment-1317889323,https://api.github.com/repos/simonw/datasette/issues/1890,1317889323,IC_kwDOBm6k_c5OjWUr,536941,2022-11-17T00:47:36Z,2022-11-17T00:47:36Z,CONTRIBUTOR,amazing! thanks @simonw ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1448143294,
https://github.com/simonw/datasette/issues/1886#issuecomment-1321241426,https://api.github.com/repos/simonw/datasette/issues/1886,1321241426,IC_kwDOBm6k_c5OwItS,536941,2022-11-20T20:58:54Z,2022-11-20T20:58:54Z,CONTRIBUTOR,i wrote up a blog post of how i'm using it! https://bunkum.us/2022/11/20/mgdo-stack.html,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1447050738,
https://github.com/simonw/datasette/issues/1796#issuecomment-1364345071,https://api.github.com/repos/simonw/datasette/issues/1796,1364345071,IC_kwDOBm6k_c5RUkDv,536941,2022-12-23T21:27:02Z,2022-12-23T21:27:02Z,CONTRIBUTOR,@simonw is this issue closed by #1893?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1355148385,
https://github.com/simonw/datasette/issues/1614#issuecomment-1364345119,https://api.github.com/repos/simonw/datasette/issues/1614,1364345119,IC_kwDOBm6k_c5RUkEf,536941,2022-12-23T21:27:10Z,2022-12-23T21:27:10Z,CONTRIBUTOR,is this issue closed by #1893?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1115435536,
https://github.com/simonw/datasette/issues/1099#issuecomment-1402563930,https://api.github.com/repos/simonw/datasette/issues/1099,1402563930,IC_kwDOBm6k_c5TmW1a,536941,2023-01-24T20:11:11Z,2023-01-24T20:11:11Z,CONTRIBUTOR,"hi @simonw, this bug bit me today.

the UX for linking from a table to the foreign key seems tough! 

the design in the other direction seems a lot easier, for a given primary key detail page, add links back to the tables that refer to the row.

would you be open to a PR that solved the second problem but not the first?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",743371103,
https://github.com/simonw/datasette/issues/1099#issuecomment-1402900354,https://api.github.com/repos/simonw/datasette/issues/1099,1402900354,IC_kwDOBm6k_c5Tno-C,536941,2023-01-25T00:58:26Z,2023-01-25T00:58:26Z,CONTRIBUTOR,"> My original idea for compound foreign keys was to turn both of those columns into links, but that doesn't fit here because `database_name` is already part of a different foreign key.

it's pretty hard to know what the right thing to do is if a field is part of multiple foreign keys. 

but, if that's not the case, what about making each of the columns a link. seems like an improvement over the status quo.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",743371103,
https://github.com/simonw/datasette/pull/2003#issuecomment-1402898033,https://api.github.com/repos/simonw/datasette/issues/2003,1402898033,IC_kwDOBm6k_c5TnoZx,536941,2023-01-25T00:54:41Z,2023-01-25T00:54:41Z,CONTRIBUTOR,"@simonw, let me know what you think about this approach!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1555701851,
https://github.com/simonw/datasette/issues/1099#issuecomment-1402898291,https://api.github.com/repos/simonw/datasette/issues/1099,1402898291,IC_kwDOBm6k_c5Tnodz,536941,2023-01-25T00:55:06Z,2023-01-25T00:55:06Z,CONTRIBUTOR,"I went ahead and spiked something together, in #2003 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",743371103,
https://github.com/simonw/datasette/pull/2003#issuecomment-1404065571,https://api.github.com/repos/simonw/datasette/issues/2003,1404065571,IC_kwDOBm6k_c5TsFcj,536941,2023-01-25T18:44:42Z,2023-01-25T18:44:42Z,CONTRIBUTOR,see this related discussion to a change in API in sqlite-utils https://github.com/simonw/sqlite-utils/pull/203#issuecomment-753567932,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1555701851,
https://github.com/simonw/datasette/issues/1655#issuecomment-1766994810,https://api.github.com/repos/simonw/datasette/issues/1655,1766994810,IC_kwDOBm6k_c5pUjN6,536941,2023-10-17T19:01:59Z,2023-10-17T19:01:59Z,CONTRIBUTOR,"hi @yejiyang, have your tried using my fork of datasette: https://github.com/fgregg/datasette/tree/no_limit_csv_publish
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1163369515,
https://github.com/simonw/datasette/issues/1655#issuecomment-1767219901,https://api.github.com/repos/simonw/datasette/issues/1655,1767219901,IC_kwDOBm6k_c5pVaK9,536941,2023-10-17T21:29:03Z,2023-10-17T21:29:03Z,CONTRIBUTOR,@yejiyang why don’t you move this discussion to my fork to spare simon’s notifications ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1163369515,
https://github.com/simonw/datasette/issues/2213#issuecomment-1843072926,https://api.github.com/repos/simonw/datasette/issues/2213,1843072926,IC_kwDOBm6k_c5t2w-e,536941,2023-12-06T15:05:44Z,2023-12-06T15:05:44Z,CONTRIBUTOR,"it probably does not make sense to gzip large sqlite database files on the fly. it can take many seconds to gzip a large file and you either have to have this big thing in memory, or write it to disk, which some deployment environments will not like.

i wonder if it would make sense to gzip the databases as part of the datasette publish process. it would be very cool to statically serve those as if they dynamically zipped (i.e. serve the filename example.db, not example.db.zip, and rely on the browser to expand).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",2028698018,
https://github.com/simonw/sqlite-utils/issues/26#issuecomment-964205475,https://api.github.com/repos/simonw/sqlite-utils/issues/26,964205475,IC_kwDOCGYnMM45eJuj,536941,2021-11-09T14:31:29Z,2021-11-09T14:31:29Z,CONTRIBUTOR,i was just reaching for a tool to do this this morning,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",455486286,
https://github.com/simonw/sqlite-utils/issues/353#issuecomment-991405755,https://api.github.com/repos/simonw/sqlite-utils/issues/353,991405755,IC_kwDOCGYnMM47F6a7,536941,2021-12-11T01:38:29Z,2021-12-11T01:38:29Z,CONTRIBUTOR,"wow! that's awesome! thanks so much, @simonw!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1077102934,
https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007636709,https://api.github.com/repos/simonw/sqlite-utils/issues/365,1007636709,IC_kwDOCGYnMM48D1Dl,536941,2022-01-07T18:28:33Z,2022-01-07T18:29:43Z,CONTRIBUTOR,"i added an index to one table with sqlite-utils, and then a query that used to take about 1 second started taking hundreds of seconds. 

running analyze got me back to sub second speed.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096558279,
https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008164786,https://api.github.com/repos/simonw/sqlite-utils/issues/365,1008164786,IC_kwDOCGYnMM48F1-y,536941,2022-01-08T22:24:19Z,2022-01-08T22:24:19Z,CONTRIBUTOR,the out-of-date scenario you describe could be addressed by automatically adding an analyze to the insert or convert commands if they implicate an index,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096558279,
https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008164116,https://api.github.com/repos/simonw/sqlite-utils/issues/365,1008164116,IC_kwDOCGYnMM48F10U,536941,2022-01-08T22:18:57Z,2022-01-08T22:18:57Z,CONTRIBUTOR,"the table with the query ran so bad was about 50k. 

i think the scenario should not be worse than no stats. 

i also did not know that sqlite was so different from postgres and needed an explicit analyze call.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096558279,
https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008161965,https://api.github.com/repos/simonw/sqlite-utils/issues/365,1008161965,IC_kwDOCGYnMM48F1St,536941,2022-01-08T22:02:56Z,2022-01-08T22:02:56Z,CONTRIBUTOR,"for options 2 and 3, i would worry about discoverablity. 

in other db’s it is not necessary to explicitly call analyze for most indices. ie for postgres

> The system regularly collects statistics on all of a table's columns. Newly-created non-expression indexes can immediately use these statistics to determine an index's usefulness.

i suppose i would propose raising a warning if the stats table is created that explains what is going on and informs users about a —no-analyze argument.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096558279,
https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008166084,https://api.github.com/repos/simonw/sqlite-utils/issues/365,1008166084,IC_kwDOCGYnMM48F2TE,536941,2022-01-08T22:32:47Z,2022-01-08T22:32:47Z,CONTRIBUTOR,or using “ pragma optimize”,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096558279,
https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008275546,https://api.github.com/repos/simonw/sqlite-utils/issues/365,1008275546,IC_kwDOCGYnMM48GRBa,536941,2022-01-09T11:01:15Z,2022-01-09T13:37:51Z,CONTRIBUTOR,"i don’t want to be such a partisan for analyze, but the query planner deciding *not* to use an index based on information collected by analyze is not necessarily a bug, but could be the correct choice.

<s>the original poster in that stack overflow doesn’t say there’s a performance regression </s>","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096558279,
https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1009548580,https://api.github.com/repos/simonw/sqlite-utils/issues/365,1009548580,IC_kwDOCGYnMM48LH0k,536941,2022-01-11T02:43:34Z,2022-01-11T02:43:34Z,CONTRIBUTOR,thanks so much! always a pleasure to see how you work through these things,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096558279,
https://github.com/simonw/sqlite-utils/issues/26#issuecomment-1032120014,https://api.github.com/repos/simonw/sqlite-utils/issues/26,1032120014,IC_kwDOCGYnMM49hObO,536941,2022-02-08T01:32:34Z,2022-02-08T01:32:34Z,CONTRIBUTOR,"if you are curious about prior art, https://github.com/jsnell/json-to-multicsv is really good!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",455486286,
https://github.com/simonw/sqlite-utils/issues/403#issuecomment-1032126353,https://api.github.com/repos/simonw/sqlite-utils/issues/403,1032126353,IC_kwDOCGYnMM49hP-R,536941,2022-02-08T01:45:15Z,2022-02-08T01:45:31Z,CONTRIBUTOR,"you can hack something like this to achieve this result:

`sqlite-utils convert my_database my_table rowid ""{'id': value}"" --multi`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1126692066,
https://github.com/simonw/sqlite-utils/issues/403#issuecomment-1033332570,https://api.github.com/repos/simonw/sqlite-utils/issues/403,1033332570,IC_kwDOCGYnMM49l2da,536941,2022-02-09T04:22:43Z,2022-02-09T04:22:43Z,CONTRIBUTOR,dddoooope,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1126692066,
https://github.com/simonw/sqlite-utils/issues/423#issuecomment-1189010812,https://api.github.com/repos/simonw/sqlite-utils/issues/423,1189010812,IC_kwDOCGYnMM5G3t18,536941,2022-07-19T12:47:39Z,2022-07-19T12:47:39Z,CONTRIBUTOR,just ran into this!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1199158210,
https://github.com/simonw/sqlite-utils/issues/456#issuecomment-1190272780,https://api.github.com/repos/simonw/sqlite-utils/issues/456,1190272780,IC_kwDOCGYnMM5G8h8M,536941,2022-07-20T13:14:54Z,2022-07-20T13:14:54Z,CONTRIBUTOR,"for example, i have data on votes that look like this:

| ballot_id | option_id | choice |
|-|-|-|
| 1 | 1 | 0 | 
| 1 | 2 | 1 |
| 1 | 3 | 0 |
| 1 | 4 | 1 |
| 2 | 1 | 1 |
| 2 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 4 | 0 |

and i want to reshape from this long form to this wide form:

| ballot_id | option_id_1 | option_id_2 | option_id_3 | option_id_ 4|
|-|-|-|-| -|
| 1 | 0 | 1 | 0 | 1 | 
| 2 | 1 | 0 | 1| 0 | 

i could do such a think like this.

```sql
select ballot_id, 
sum(choice) filter (where option_id = 1) as option_id_1,
sum(choice) filter (where option_id = 2) as option_id_2,
sum(choice) filter (where option_id = 3) as option_id_3,
sum(choice) filter (where option_id = 4) as option_id_4
from vote
group by ballot_id
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1310243385,
https://github.com/simonw/sqlite-utils/issues/456#issuecomment-1190277829,https://api.github.com/repos/simonw/sqlite-utils/issues/456,1190277829,IC_kwDOCGYnMM5G8jLF,536941,2022-07-20T13:19:15Z,2022-07-20T13:19:15Z,CONTRIBUTOR,hadley wickham's melt and reshape could be good inspo: http://had.co.nz/reshape/introduction.pdf,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1310243385,
https://github.com/simonw/sqlite-utils/issues/523#issuecomment-1407264466,https://api.github.com/repos/simonw/sqlite-utils/issues/523,1407264466,IC_kwDOCGYnMM5T4SbS,536941,2023-01-28T02:41:14Z,2023-01-28T02:41:14Z,CONTRIBUTOR,"I also often then run another little script to cast all empty strings to null, but i save that for another issue if this gets accepted.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1560651350,
https://github.com/simonw/sqlite-utils/pull/203#issuecomment-1404070841,https://api.github.com/repos/simonw/sqlite-utils/issues/203,1404070841,IC_kwDOCGYnMM5TsGu5,536941,2023-01-25T18:47:18Z,2023-01-25T18:47:18Z,CONTRIBUTOR,i'll adopt this PR to make the changes @simonw suggested https://github.com/simonw/sqlite-utils/pull/203#issuecomment-753567932,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",743384829,