home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

79 rows where user = 536941 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

issue >30

  • Stream all results for arbitrary SQL and canned queries 10
  • docker image is duplicating db files somehow 10
  • create-index should run analyze after creating index 7
  • Exceeding Cloud Run memory limits when deploying a 4.8G database 4
  • Support linking to compound foreign keys 3
  • don't use immutable=1, only mode=ro 3
  • Mechanism for turning nested JSON into foreign keys / many-to-many 2
  • Refactor .csv to be an output renderer - and teach register_output_renderer to stream all rows 2
  • `publish cloudrun` should deploy a more recent SQLite version 2
  • Redesign CSV export to improve usability 2
  • if csv export is truncated in non streaming mode set informative response header 2
  • Document how to add a primary key to a rowid table using `sqlite-utils transform --pk` 2
  • query result page is using 400mb of browser memory 40x size of html page and 400x size of csv data 2
  • feature request: pivot command 2
  • google cloudrun updated their limits on maxscale based on memory and cpu count 2
  • Show referring tables and rows when the referring foreign key is compound 2
  • changes to allow for compound foreign keys 1
  • Feature or Documentation Request: Individual table as home page template 1
  • Publishing to cloudrun with immutable mode? 1
  • unordered list is not rendering bullet points in description_html on database page 1
  • Allow routes to have extra options 1
  • Allow passing a file of code to "sqlite-utils convert" 1
  • add hash id to "_memory" url if hashed url mode is turned on and crossdb is also turned on 1
  • introduce new option for datasette package to use a slim base image 1
  • when hashed urls are turned on, the _memory db has improperly long-lived cache expiry 1
  • don't set far expiry if hash is '000' 1
  • consider adding deletion step of cloudbuild artifacts to gcloud publish 1
  • Try again with SQLite codemirror support 1
  • Tweak mobile keyboard settings 1
  • Mechanism for disabling faceting on large tables only 1
  • …

user 1

  • fgregg · 79 ✖

author_association 1

  • CONTRIBUTOR 79
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1407264466 https://github.com/simonw/sqlite-utils/issues/523#issuecomment-1407264466 https://api.github.com/repos/simonw/sqlite-utils/issues/523 IC_kwDOCGYnMM5T4SbS fgregg 536941 2023-01-28T02:41:14Z 2023-01-28T02:41:14Z CONTRIBUTOR

I also often then run another little script to cast all empty strings to null, but i save that for another issue if this gets accepted.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Feature request: trim all leading and trailing white space for all columns for all tables in a database 1560651350  
1404070841 https://github.com/simonw/sqlite-utils/pull/203#issuecomment-1404070841 https://api.github.com/repos/simonw/sqlite-utils/issues/203 IC_kwDOCGYnMM5TsGu5 fgregg 536941 2023-01-25T18:47:18Z 2023-01-25T18:47:18Z CONTRIBUTOR

i'll adopt this PR to make the changes @simonw suggested https://github.com/simonw/sqlite-utils/pull/203#issuecomment-753567932

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
changes to allow for compound foreign keys 743384829  
1404065571 https://github.com/simonw/datasette/pull/2003#issuecomment-1404065571 https://api.github.com/repos/simonw/datasette/issues/2003 IC_kwDOBm6k_c5TsFcj fgregg 536941 2023-01-25T18:44:42Z 2023-01-25T18:44:42Z CONTRIBUTOR

see this related discussion to a change in API in sqlite-utils https://github.com/simonw/sqlite-utils/pull/203#issuecomment-753567932

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show referring tables and rows when the referring foreign key is compound 1555701851  
1402900354 https://github.com/simonw/datasette/issues/1099#issuecomment-1402900354 https://api.github.com/repos/simonw/datasette/issues/1099 IC_kwDOBm6k_c5Tno-C fgregg 536941 2023-01-25T00:58:26Z 2023-01-25T00:58:26Z CONTRIBUTOR

My original idea for compound foreign keys was to turn both of those columns into links, but that doesn't fit here because database_name is already part of a different foreign key.

it's pretty hard to know what the right thing to do is if a field is part of multiple foreign keys.

but, if that's not the case, what about making each of the columns a link. seems like an improvement over the status quo.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Support linking to compound foreign keys 743371103  
1402898291 https://github.com/simonw/datasette/issues/1099#issuecomment-1402898291 https://api.github.com/repos/simonw/datasette/issues/1099 IC_kwDOBm6k_c5Tnodz fgregg 536941 2023-01-25T00:55:06Z 2023-01-25T00:55:06Z CONTRIBUTOR

I went ahead and spiked something together, in #2003

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Support linking to compound foreign keys 743371103  
1402898033 https://github.com/simonw/datasette/pull/2003#issuecomment-1402898033 https://api.github.com/repos/simonw/datasette/issues/2003 IC_kwDOBm6k_c5TnoZx fgregg 536941 2023-01-25T00:54:41Z 2023-01-25T00:54:41Z CONTRIBUTOR

@simonw, let me know what you think about this approach!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show referring tables and rows when the referring foreign key is compound 1555701851  
1402563930 https://github.com/simonw/datasette/issues/1099#issuecomment-1402563930 https://api.github.com/repos/simonw/datasette/issues/1099 IC_kwDOBm6k_c5TmW1a fgregg 536941 2023-01-24T20:11:11Z 2023-01-24T20:11:11Z CONTRIBUTOR

hi @simonw, this bug bit me today.

the UX for linking from a table to the foreign key seems tough!

the design in the other direction seems a lot easier, for a given primary key detail page, add links back to the tables that refer to the row.

would you be open to a PR that solved the second problem but not the first?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Support linking to compound foreign keys 743371103  
1364345119 https://github.com/simonw/datasette/issues/1614#issuecomment-1364345119 https://api.github.com/repos/simonw/datasette/issues/1614 IC_kwDOBm6k_c5RUkEf fgregg 536941 2022-12-23T21:27:10Z 2022-12-23T21:27:10Z CONTRIBUTOR

is this issue closed by #1893?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Try again with SQLite codemirror support 1115435536  
1364345071 https://github.com/simonw/datasette/issues/1796#issuecomment-1364345071 https://api.github.com/repos/simonw/datasette/issues/1796 IC_kwDOBm6k_c5RUkDv fgregg 536941 2022-12-23T21:27:02Z 2022-12-23T21:27:02Z CONTRIBUTOR

@simonw is this issue closed by #1893?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Research an upgrade to CodeMirror 6 1355148385  
1321241426 https://github.com/simonw/datasette/issues/1886#issuecomment-1321241426 https://api.github.com/repos/simonw/datasette/issues/1886 IC_kwDOBm6k_c5OwItS fgregg 536941 2022-11-20T20:58:54Z 2022-11-20T20:58:54Z CONTRIBUTOR

i wrote up a blog post of how i'm using it! https://bunkum.us/2022/11/20/mgdo-stack.html

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Call for birthday presents: if you're using Datasette, let us know how you're using it here 1447050738  
1317889323 https://github.com/simonw/datasette/issues/1890#issuecomment-1317889323 https://api.github.com/repos/simonw/datasette/issues/1890 IC_kwDOBm6k_c5OjWUr fgregg 536941 2022-11-17T00:47:36Z 2022-11-17T00:47:36Z CONTRIBUTOR

amazing! thanks @simonw

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Autocomplete text entry for filter values that correspond to facets 1448143294  
1295667649 https://github.com/simonw/datasette/pull/1870#issuecomment-1295667649 https://api.github.com/repos/simonw/datasette/issues/1870 IC_kwDOBm6k_c5NOlHB fgregg 536941 2022-10-29T00:52:43Z 2022-10-29T00:53:43Z CONTRIBUTOR

Are you saying that I can build a container, but then when I run it and it does datasette serve -i data.db ... it will somehow modify the image, or create a new modified filesystem layer in the runtime environment, as a result of running that serve command?

Somehow, datasette serve -i data.db will lead to the data.db being modified, which will trigger a copy-on-write of data.db into the read-write layer of the container.

I don't understand how that happens.

it kind of feels like a bug in sqlite, but i can't quite follow the sqlite code.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
don't use immutable=1, only mode=ro 1426379903  
1294285471 https://github.com/simonw/datasette/pull/1870#issuecomment-1294285471 https://api.github.com/repos/simonw/datasette/issues/1870 IC_kwDOBm6k_c5NJTqf fgregg 536941 2022-10-28T01:06:03Z 2022-10-28T01:06:03Z CONTRIBUTOR

as far as i can tell, this is where the "immutable" argument is used in sqlite:

c pPager->noLock = sqlite3_uri_boolean(pPager->zFilename, "nolock", 0); if( (iDc & SQLITE_IOCAP_IMMUTABLE)!=0 || sqlite3_uri_boolean(pPager->zFilename, "immutable", 0) ){ vfsFlags |= SQLITE_OPEN_READONLY; goto act_like_temp_file; }

so it does set the read only flag, but then has a goto.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
don't use immutable=1, only mode=ro 1426379903  
1294237783 https://github.com/simonw/datasette/pull/1870#issuecomment-1294237783 https://api.github.com/repos/simonw/datasette/issues/1870 IC_kwDOBm6k_c5NJIBX fgregg 536941 2022-10-27T23:42:18Z 2022-10-27T23:42:18Z CONTRIBUTOR

Relevant sqlite forum thread: https://www.sqlite.org/forum/forumpost/02f7bda329f41e30451472421cf9ce7f715b768ce3db02797db1768e47950d48

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
don't use immutable=1, only mode=ro 1426379903  
1272357976 https://github.com/simonw/datasette/issues/1836#issuecomment-1272357976 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5L1qRY fgregg 536941 2022-10-08T16:56:51Z 2022-10-08T16:56:51Z CONTRIBUTOR

when you are running from docker, you always will want to run as mode=ro because the same thing that is causing duplication in the inspect layer will cause duplication in the final container read/write layer when datasette serve runs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1271103097 https://github.com/simonw/datasette/issues/1836#issuecomment-1271103097 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5Lw355 fgregg 536941 2022-10-07T04:43:41Z 2022-10-07T04:43:41Z CONTRIBUTOR

@simonw, should i open up a new issue for investigating the differences between "immutable=1" and "mode=ro" and possibly switching to "mode=ro". Or would you like to keep that conversation in this issue?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1271101072 https://github.com/simonw/datasette/issues/1480#issuecomment-1271101072 https://api.github.com/repos/simonw/datasette/issues/1480 IC_kwDOBm6k_c5Lw3aQ fgregg 536941 2022-10-07T04:39:10Z 2022-10-07T04:39:10Z CONTRIBUTOR

switching from immutable=1 to mode=ro completely addressed this. see https://github.com/simonw/datasette/issues/1836#issuecomment-1271100651 for details.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Exceeding Cloud Run memory limits when deploying a 4.8G database 1015646369  
1271100651 https://github.com/simonw/datasette/issues/1836#issuecomment-1271100651 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5Lw3Tr fgregg 536941 2022-10-07T04:38:14Z 2022-10-07T04:38:14Z CONTRIBUTOR

yes, and i also think that this is causing the apparent memory problems in #1480. when the container starts up, it will make some operation on the database in immutable mode which apparently makes some small change to the db file. if that's so, then the db files will be copied to the read/write layer which counts against cloudrun's memory allocation!

running a test of that now.

this completely addressed #1480

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1271035998 https://github.com/simonw/datasette/issues/1301#issuecomment-1271035998 https://api.github.com/repos/simonw/datasette/issues/1301 IC_kwDOBm6k_c5Lwnhe fgregg 536941 2022-10-07T02:38:04Z 2022-10-07T02:38:04Z CONTRIBUTOR

the only mode that publish cloudrun supports right now is immutable

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Publishing to cloudrun with immutable mode? 860722711  
1271020193 https://github.com/simonw/datasette/issues/1836#issuecomment-1271020193 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5Lwjqh fgregg 536941 2022-10-07T02:15:05Z 2022-10-07T02:21:08Z CONTRIBUTOR

when i hack the connect method to open non mutable files with "mode=ro" and not "immutable=1" https://github.com/simonw/datasette/blob/eff112498ecc499323c26612d707908831446d25/datasette/database.py#L79

then:

bash 870 B RUN /bin/sh -c datasette inspect nlrb.db --inspect-file inspect-data.json

the datasette inspect layer is only the size of the json file!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1271008997 https://github.com/simonw/datasette/issues/1836#issuecomment-1271008997 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5Lwg7l fgregg 536941 2022-10-07T02:00:37Z 2022-10-07T02:00:49Z CONTRIBUTOR

yes, and i also think that this is causing the apparent memory problems in #1480. when the container starts up, it will make some operation on the database in immutable mode which apparently makes some small change to the db file. if that's so, then the db files will be copied to the read/write layer which counts against cloudrun's memory allocation!

running a test of that now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1271003212 https://github.com/simonw/datasette/issues/1836#issuecomment-1271003212 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5LwfhM fgregg 536941 2022-10-07T01:52:04Z 2022-10-07T01:52:04Z CONTRIBUTOR

and if we try immutable mode, which is how things are opened by datasette inspect we duplicate the files!!!

```python

test_sql_immutable.py

import sqlite3 import sys

db_name = sys.argv[1] conn = sqlite3.connect(f'file:/app/{db_name}?immutable=1', uri=True) cur = conn.cursor() cur.execute('select count(*) from filing') print(cur.fetchone()) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1270992795 https://github.com/simonw/datasette/issues/1836#issuecomment-1270992795 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5Lwc-b fgregg 536941 2022-10-07T01:29:15Z 2022-10-07T01:50:14Z CONTRIBUTOR

fascinatingly, telling python to open sqlite in read only mode makes this layer have a size of 0

```python

test_sql_ro.py

import sqlite3 import sys

db_name = sys.argv[1] conn = sqlite3.connect(f'file:/app/{db_name}?mode=ro', uri=True) cur = conn.cursor() cur.execute('select count(*) from filing') print(cur.fetchone()) ```

that's quite weird because setting the file permissions to read only didn't do anything. (on reflection, that chmod isn't doing anything because the dockerfile commands are run as root)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1270988081 https://github.com/simonw/datasette/issues/1836#issuecomment-1270988081 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5Lwb0x fgregg 536941 2022-10-07T01:19:01Z 2022-10-07T01:27:35Z CONTRIBUTOR

okay, some progress!! running some sql against a database file causes that file to get duplicated even if it doesn't apparently change the file.

make a little test script like this:

```python

test_sql.py

import sqlite3 import sys

db_name = sys.argv[1] conn = sqlite3.connect(f'file:/app/{db_name}', uri=True) cur = conn.cursor() cur.execute('select count(*) from filing') print(cur.fetchone()) ```

then

docker RUN python test_sql.py nlrb.db

produced a layer that's the same size as nlrb.db!!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1270936982 https://github.com/simonw/datasette/issues/1836#issuecomment-1270936982 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5LwPWW fgregg 536941 2022-10-07T00:52:41Z 2022-10-07T00:52:41Z CONTRIBUTOR

it's not that the inspect command is somehow changing the db files. if i set them to only read-only, the "inspect" layer still has the same very large size.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1270923537 https://github.com/simonw/datasette/issues/1836#issuecomment-1270923537 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5LwMER fgregg 536941 2022-10-07T00:46:08Z 2022-10-07T00:46:08Z CONTRIBUTOR

i thought it was maybe to do with reading through all the files, but that does not seem to be the case

if i make a little test file like:

```python

test_read.py

import hashlib import sys import pathlib

HASH_BLOCK_SIZE = 1024 * 1024

def inspect_hash(path): """Calculate the hash of a database, efficiently.""" m = hashlib.sha256() with path.open("rb") as fp: while True: data = fp.read(HASH_BLOCK_SIZE) if not data: break m.update(data)

return m.hexdigest()

inspect_hash(pathlib.Path(sys.argv[1])) ```

then a line in the Dockerfile like

docker RUN python test_read.py nlrb.db && echo "[]" > /etc/inspect.json

just produes a layer of 3B

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1269847461 https://github.com/simonw/datasette/issues/1480#issuecomment-1269847461 https://api.github.com/repos/simonw/datasette/issues/1480 IC_kwDOBm6k_c5LsFWl fgregg 536941 2022-10-06T11:21:49Z 2022-10-06T11:21:49Z CONTRIBUTOR

thanks @simonw, i'll spend a little more time trying to figure out why this isn't working on cloudrun, and then will flip over to fly if i can't.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Exceeding Cloud Run memory limits when deploying a 4.8G database 1015646369  
1268629159 https://github.com/simonw/datasette/issues/1480#issuecomment-1268629159 https://api.github.com/repos/simonw/datasette/issues/1480 IC_kwDOBm6k_c5Lnb6n fgregg 536941 2022-10-05T16:00:55Z 2022-10-05T16:00:55Z CONTRIBUTOR

as a next step, i'll fetch the docker image from the google registry, and see what memory and disk usage looks like when i run it locally.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Exceeding Cloud Run memory limits when deploying a 4.8G database 1015646369  
1268613335 https://github.com/simonw/datasette/issues/1480#issuecomment-1268613335 https://api.github.com/repos/simonw/datasette/issues/1480 IC_kwDOBm6k_c5LnYDX fgregg 536941 2022-10-05T15:45:49Z 2022-10-05T15:45:49Z CONTRIBUTOR

running into this as i continue to grow my labor data warehouse.

Here a CloudRun PM says the container size should not count against memory: https://stackoverflow.com/a/56570717

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Exceeding Cloud Run memory limits when deploying a 4.8G database 1015646369  
1260909128 https://github.com/simonw/datasette/issues/1062#issuecomment-1260909128 https://api.github.com/repos/simonw/datasette/issues/1062 IC_kwDOBm6k_c5LJ_JI fgregg 536941 2022-09-28T13:22:53Z 2022-09-28T14:09:54Z CONTRIBUTOR

if you went this route:

python with sqlite_timelimit(conn, time_limit_ms): c.execute(query) for chunk in c.fetchmany(chunk_size): yield from chunk

then time_limit_ms would probably have to be greatly extended, because the time spent in the loop will depend on the downstream processing.

i wonder if this was why you were thinking this feature would need a dedicated connection?


reading more, there's no real limit i can find on the number of active cursors (or more precisely active prepared statements objects, because sqlite doesn't really have cursors).

maybe something like this would be okay?

python with sqlite_timelimit(conn, time_limit_ms): c.execute(query) # step through at least one to evaluate the statement, not sure if this is necessary yield c.execute.fetchone() for chunk in c.fetchmany(chunk_size): yield from chunk

this seems quite weird that there's not more of limit of the number of active prepared statements, but i haven't been able to find one.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Refactor .csv to be an output renderer - and teach register_output_renderer to stream all rows 732674148  
1260829829 https://github.com/simonw/datasette/issues/1062#issuecomment-1260829829 https://api.github.com/repos/simonw/datasette/issues/1062 IC_kwDOBm6k_c5LJryF fgregg 536941 2022-09-28T12:27:19Z 2022-09-28T12:27:19Z CONTRIBUTOR

for teaching register_output_renderer to stream it seems like the two options are to

  1. a nested query technique to paginate through
  2. a fetching model that looks like something python with sqlite_timelimit(conn, time_limit_ms): c.execute(query) for chunk in c.fetchmany(chunk_size): yield from chunk currently db.execute is not a generator, so this would probably need a new method?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Refactor .csv to be an output renderer - and teach register_output_renderer to stream all rows 732674148  
1259718517 https://github.com/simonw/datasette/issues/526#issuecomment-1259718517 https://api.github.com/repos/simonw/datasette/issues/526 IC_kwDOBm6k_c5LFcd1 fgregg 536941 2022-09-27T16:02:51Z 2022-09-27T16:04:46Z CONTRIBUTOR

i think that max_returned_rows is a defense mechanism, just not for connection exhaustion. max_returned_rows is a defense mechanism against memory bombs.

if you are potentially yielding out hundreds of thousands or even millions of rows, you need to be quite careful about data flow to not run out of memory on the server, or on the client.

you have a lot of places in your code that are protective of that right now, but max_returned_rows acts as the final backstop.

so, given that, it makes sense to have removing max_returned_rows altogether be a non-goal, but instead allow for for specific codepaths (like streaming csv's) be able to bypass.

that could dramatically lower the surface area for a memory-bomb attack.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
1258910228 https://github.com/simonw/datasette/issues/526#issuecomment-1258910228 https://api.github.com/repos/simonw/datasette/issues/526 IC_kwDOBm6k_c5LCXIU fgregg 536941 2022-09-27T03:11:07Z 2022-09-27T03:11:07Z CONTRIBUTOR

i think this feature would be safe, as its really only the time limit that can, and imo, should protect against long running queries, as it is pretty easy to make very expensive queries that don't return many rows.

moving away from max_returned_rows will requires some thinking about:

  1. memory usage and data flows to handle potentially very large result sets
  2. how to avoid rendering tens or hundreds of thousands of html rows.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
1258878311 https://github.com/simonw/datasette/issues/526#issuecomment-1258878311 https://api.github.com/repos/simonw/datasette/issues/526 IC_kwDOBm6k_c5LCPVn fgregg 536941 2022-09-27T02:19:48Z 2022-09-27T02:19:48Z CONTRIBUTOR

this sql query doesn't trip up maximum_returned_rows but does timeout

sql with recursive counter(x) as ( select 0 union select x + 1 from counter ) select * from counter LIMIT 10 OFFSET 100000000

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
1258871525 https://github.com/simonw/datasette/issues/526#issuecomment-1258871525 https://api.github.com/repos/simonw/datasette/issues/526 IC_kwDOBm6k_c5LCNrl fgregg 536941 2022-09-27T02:09:32Z 2022-09-27T02:14:53Z CONTRIBUTOR

thanks @simonw, i learned something i didn't know about sqlite's execution model!

Imagine if Datasette CSVs did allow unlimited retrievals. Someone could hit the CSV endpoint for that recursive query and tie up Datasette's SQL connection effectively forever.

why wouldn't the sqlite_timelimit guard prevent that?


on my local version which has the code to turn off truncations for query csv, sqlite_timelimit does protect me.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
1258849766 https://github.com/simonw/datasette/issues/526#issuecomment-1258849766 https://api.github.com/repos/simonw/datasette/issues/526 IC_kwDOBm6k_c5LCIXm fgregg 536941 2022-09-27T01:27:03Z 2022-09-27T01:27:03Z CONTRIBUTOR

i agree with that concern! but if i'm understanding the code correctly, maximum_returned_rows does not protect against long-running queries in any way.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
1258803261 https://github.com/simonw/datasette/pull/1820#issuecomment-1258803261 https://api.github.com/repos/simonw/datasette/issues/1820 IC_kwDOBm6k_c5LB9A9 fgregg 536941 2022-09-27T00:03:09Z 2022-09-27T00:03:09Z CONTRIBUTOR

the pattern in this PR max_returned_rows control the maximum rows rendered through html and json, and the csv render bypasses that.

i think it would be better to have each of these different query renderers have more direct control for how many rows to fetch, instead of relying on the internals of the execute method.

generally, users will not want to paginate through tens of thousands of results, but often will want to download a full query as json or as csv.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
[SPIKE] Don't truncate query CSVs 1386456717  
1258337011 https://github.com/simonw/datasette/issues/526#issuecomment-1258337011 https://api.github.com/repos/simonw/datasette/issues/526 IC_kwDOBm6k_c5LALLz fgregg 536941 2022-09-26T16:49:48Z 2022-09-26T16:49:48Z CONTRIBUTOR

i think the smallest change that gets close to what i want is to change the behavior so that max_returned_rows is not applied in the execute method when we are are asking for a csv of query.

there are some infelicities for that approach, but i'll make a PR to make it easier to discuss.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
1258167564 https://github.com/simonw/datasette/issues/526#issuecomment-1258167564 https://api.github.com/repos/simonw/datasette/issues/526 IC_kwDOBm6k_c5K_h0M fgregg 536941 2022-09-26T14:57:44Z 2022-09-26T15:08:36Z CONTRIBUTOR

reading the database execute method i have a few questions.

https://github.com/simonw/datasette/blob/cb1e093fd361b758120aefc1a444df02462389a3/datasette/database.py#L229-L242


unless i'm missing something (which is very likely!!), the max_returned_rows argument doesn't actually offer any protections against running very expensive queries.

It's not like adding a LIMIT max_rows argument. it make sense that it isn't because, the query could already have an LIMIT argument. Doing something like select * from (query) limit {max_returned_rows} might be protective but wouldn't always.

Instead the code executes the full original query, and if still has time it fetches out the first max_rows + 1 rows.

this does offer some protection of memory exhaustion, as you won't hydrate a huge result set into python (however, there are data flow patterns that could avoid that too)

given the current architecture, i don't see how creating a new connection would be use?


If we just removed the max_return_rows limitation, then i think most things would be fine except for the QueryViews. Right now rendering, just 5000 rows takes a lot of client-side memory so some form of pagination would be required.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
1258166572 https://github.com/simonw/datasette/issues/1655#issuecomment-1258166572 https://api.github.com/repos/simonw/datasette/issues/1655 IC_kwDOBm6k_c5K_hks fgregg 536941 2022-09-26T14:57:04Z 2022-09-26T14:57:04Z CONTRIBUTOR

I think that paginating, even in javascript, could be very helpful. Maybe render json or csv into the page and let javascript loading that into the dom?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
query result page is using 400mb of browser memory 40x size of html page and 400x size of csv data 1163369515  
1258129113 https://github.com/simonw/datasette/issues/1727#issuecomment-1258129113 https://api.github.com/repos/simonw/datasette/issues/1727 IC_kwDOBm6k_c5K_YbZ fgregg 536941 2022-09-26T14:30:11Z 2022-09-26T14:48:31Z CONTRIBUTOR

from your analysis, it seems like the GIL is blocking on loading of the data from sqlite to python, (particularly in the fetchmany call)

this is probably a simplistic idea, but what if you had the python code in the execute method iterate over the cursor and yield out rows or small chunks of rows.

something like:
python with sqlite_timelimit(conn, time_limit_ms): try: cursor = conn.cursor() cursor.execute(sql, params if params is not None else {}) except: ... max_returned_rows = self.ds.max_returned_rows if max_returned_rows == page_size: max_returned_rows += 1 if max_returned_rows and truncate: for i, row in enumerate(cursor): yield row if i == max_returned_rows - 1: break else: for row in cursor: yield row truncated = False

this kind of thing works well with a postgres server side cursor, but i'm not sure if it will hold for sqlite.

you would still spend about the same amount of time in python and would be contending for the gil, but it would be could be non blocking.

depending on the data flow, this could also some benefit for memory. (data stays in more compact sqlite-land until you need it)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Research: demonstrate if parallel SQL queries are worthwhile 1217759117  
1254064260 https://github.com/simonw/datasette/issues/526#issuecomment-1254064260 https://api.github.com/repos/simonw/datasette/issues/526 IC_kwDOBm6k_c5Kv4CE fgregg 536941 2022-09-21T18:17:04Z 2022-09-21T18:18:01Z CONTRIBUTOR

hi @simonw, this is becoming more of a bother for my labor data warehouse. Is there any research or a spike i could do that would help you investigate this issue?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
1214437408 https://github.com/simonw/datasette/issues/1779#issuecomment-1214437408 https://api.github.com/repos/simonw/datasette/issues/1779 IC_kwDOBm6k_c5IYtgg fgregg 536941 2022-08-14T19:42:58Z 2022-08-14T19:42:58Z CONTRIBUTOR

thanks @simonw!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
google cloudrun updated their limits on maxscale based on memory and cpu count 1334628400  
1210675046 https://github.com/simonw/datasette/issues/1779#issuecomment-1210675046 https://api.github.com/repos/simonw/datasette/issues/1779 IC_kwDOBm6k_c5IKW9m fgregg 536941 2022-08-10T13:28:37Z 2022-08-10T13:28:37Z CONTRIBUTOR

maybe a simpler solution is to set the maxscale to like 2? since datasette is not set up to make use of container scaling anyway?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
google cloudrun updated their limits on maxscale based on memory and cpu count 1334628400  
1190277829 https://github.com/simonw/sqlite-utils/issues/456#issuecomment-1190277829 https://api.github.com/repos/simonw/sqlite-utils/issues/456 IC_kwDOCGYnMM5G8jLF fgregg 536941 2022-07-20T13:19:15Z 2022-07-20T13:19:15Z CONTRIBUTOR

hadley wickham's melt and reshape could be good inspo: http://had.co.nz/reshape/introduction.pdf

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
feature request: pivot command 1310243385  
1190272780 https://github.com/simonw/sqlite-utils/issues/456#issuecomment-1190272780 https://api.github.com/repos/simonw/sqlite-utils/issues/456 IC_kwDOCGYnMM5G8h8M fgregg 536941 2022-07-20T13:14:54Z 2022-07-20T13:14:54Z CONTRIBUTOR

for example, i have data on votes that look like this:

| ballot_id | option_id | choice | |-|-|-| | 1 | 1 | 0 | | 1 | 2 | 1 | | 1 | 3 | 0 | | 1 | 4 | 1 | | 2 | 1 | 1 | | 2 | 2 | 0 | | 2 | 3 | 1 | | 2 | 4 | 0 |

and i want to reshape from this long form to this wide form:

| ballot_id | option_id_1 | option_id_2 | option_id_3 | option_id_ 4| |-|-|-|-| -| | 1 | 0 | 1 | 0 | 1 | | 2 | 1 | 0 | 1| 0 |

i could do such a think like this.

sql select ballot_id, sum(choice) filter (where option_id = 1) as option_id_1, sum(choice) filter (where option_id = 2) as option_id_2, sum(choice) filter (where option_id = 3) as option_id_3, sum(choice) filter (where option_id = 4) as option_id_4 from vote group by ballot_id

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
feature request: pivot command 1310243385  
1189010812 https://github.com/simonw/sqlite-utils/issues/423#issuecomment-1189010812 https://api.github.com/repos/simonw/sqlite-utils/issues/423 IC_kwDOCGYnMM5G3t18 fgregg 536941 2022-07-19T12:47:39Z 2022-07-19T12:47:39Z CONTRIBUTOR

just ran into this!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
.extract() doesn't set foreign key when extracted columns contain NULL value 1199158210  
1103312860 https://github.com/simonw/datasette/issues/1713#issuecomment-1103312860 https://api.github.com/repos/simonw/datasette/issues/1713 IC_kwDOBm6k_c5Bwzfc fgregg 536941 2022-04-20T00:52:19Z 2022-04-20T00:52:19Z CONTRIBUTOR

feels related to #1402

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Datasette feature for publishing snapshots of query results 1203943272  
1087428593 https://github.com/simonw/datasette/issues/1549#issuecomment-1087428593 https://api.github.com/repos/simonw/datasette/issues/1549 IC_kwDOBm6k_c5A0Nfx fgregg 536941 2022-04-04T11:17:13Z 2022-04-04T11:17:13Z CONTRIBUTOR

another way to get the behavior of downloading the file is to use the download attribute of the anchor tag

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#attr-download

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Redesign CSV export to improve usability 1077620955  
1078126065 https://github.com/simonw/datasette/issues/1684#issuecomment-1078126065 https://api.github.com/repos/simonw/datasette/issues/1684 IC_kwDOBm6k_c5AQuXx fgregg 536941 2022-03-24T20:08:56Z 2022-03-24T20:13:19Z CONTRIBUTOR

would be nice if the behavior was

  1. try to facet all the columns
  2. for bigger tables try to facet the indexed columns
  3. for the biggest tables, turn off autofacetting completely

This is based on my assumption that what determines autofaceting is the rarity of unique values. Which may not be true!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for disabling faceting on large tables only 1179998071  
1077047295 https://github.com/simonw/datasette/issues/1581#issuecomment-1077047295 https://api.github.com/repos/simonw/datasette/issues/1581 IC_kwDOBm6k_c5AMm__ fgregg 536941 2022-03-24T04:08:18Z 2022-03-24T04:08:18Z CONTRIBUTOR

this has been addressed by the datasette-hashed-urls plugin

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
when hashed urls are turned on, the _memory db has improperly long-lived cache expiry 1089529555  
1077047152 https://github.com/simonw/datasette/pull/1582#issuecomment-1077047152 https://api.github.com/repos/simonw/datasette/issues/1582 IC_kwDOBm6k_c5AMm9w fgregg 536941 2022-03-24T04:07:58Z 2022-03-24T04:07:58Z CONTRIBUTOR

this has been obviated by the datasette-hashed-urls plugin

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
don't set far expiry if hash is '000' 1090055810  
1062450649 https://github.com/simonw/datasette/issues/1655#issuecomment-1062450649 https://api.github.com/repos/simonw/datasette/issues/1655 IC_kwDOBm6k_c4_U7XZ fgregg 536941 2022-03-09T01:10:46Z 2022-03-09T01:10:46Z CONTRIBUTOR

i increased the max_returned_row, because I have some scripts that get CSVs from this site, and this makes doing pagination of CSVs less annoying for many cases. i know that's streaming csvs is something you are hoping to address in 1.0. let me know if there's anything i can do to help with that.

as for what if anything can be done about the size of the dom, I don't have any ideas right now, but i'll poke around.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
query result page is using 400mb of browser memory 40x size of html page and 400x size of csv data 1163369515  
1049879118 https://github.com/simonw/datasette/issues/1641#issuecomment-1049879118 https://api.github.com/repos/simonw/datasette/issues/1641 IC_kwDOBm6k_c4-k-JO fgregg 536941 2022-02-24T13:49:26Z 2022-02-24T13:49:26Z CONTRIBUTOR

maybe worth considering adding buttons for paren, asterisk, etc. under the input text box on mobile?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Tweak mobile keyboard settings 1149310456  
1033332570 https://github.com/simonw/sqlite-utils/issues/403#issuecomment-1033332570 https://api.github.com/repos/simonw/sqlite-utils/issues/403 IC_kwDOCGYnMM49l2da fgregg 536941 2022-02-09T04:22:43Z 2022-02-09T04:22:43Z CONTRIBUTOR

dddoooope

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Document how to add a primary key to a rowid table using `sqlite-utils transform --pk` 1126692066  
1032126353 https://github.com/simonw/sqlite-utils/issues/403#issuecomment-1032126353 https://api.github.com/repos/simonw/sqlite-utils/issues/403 IC_kwDOCGYnMM49hP-R fgregg 536941 2022-02-08T01:45:15Z 2022-02-08T01:45:31Z CONTRIBUTOR

you can hack something like this to achieve this result:

sqlite-utils convert my_database my_table rowid "{'id': value}" --multi

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Document how to add a primary key to a rowid table using `sqlite-utils transform --pk` 1126692066  
1032120014 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-1032120014 https://api.github.com/repos/simonw/sqlite-utils/issues/26 IC_kwDOCGYnMM49hObO fgregg 536941 2022-02-08T01:32:34Z 2022-02-08T01:32:34Z CONTRIBUTOR

if you are curious about prior art, https://github.com/jsnell/json-to-multicsv is really good!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  
1009548580 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1009548580 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48LH0k fgregg 536941 2022-01-11T02:43:34Z 2022-01-11T02:43:34Z CONTRIBUTOR

thanks so much! always a pleasure to see how you work through these things

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008275546 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008275546 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48GRBa fgregg 536941 2022-01-09T11:01:15Z 2022-01-09T13:37:51Z CONTRIBUTOR

i don’t want to be such a partisan for analyze, but the query planner deciding not to use an index based on information collected by analyze is not necessarily a bug, but could be the correct choice.

<s>the original poster in that stack overflow doesn’t say there’s a performance regression </s>

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008166084 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008166084 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F2TE fgregg 536941 2022-01-08T22:32:47Z 2022-01-08T22:32:47Z CONTRIBUTOR

or using “ pragma optimize”

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008164786 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008164786 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F1-y fgregg 536941 2022-01-08T22:24:19Z 2022-01-08T22:24:19Z CONTRIBUTOR

the out-of-date scenario you describe could be addressed by automatically adding an analyze to the insert or convert commands if they implicate an index

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008164116 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008164116 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F10U fgregg 536941 2022-01-08T22:18:57Z 2022-01-08T22:18:57Z CONTRIBUTOR

the table with the query ran so bad was about 50k.

i think the scenario should not be worse than no stats.

i also did not know that sqlite was so different from postgres and needed an explicit analyze call.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008161965 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008161965 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F1St fgregg 536941 2022-01-08T22:02:56Z 2022-01-08T22:02:56Z CONTRIBUTOR

for options 2 and 3, i would worry about discoverablity.

in other db’s it is not necessary to explicitly call analyze for most indices. ie for postgres

The system regularly collects statistics on all of a table's columns. Newly-created non-expression indexes can immediately use these statistics to determine an index's usefulness.

i suppose i would propose raising a warning if the stats table is created that explains what is going on and informs users about a —no-analyze argument.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1007844190 https://github.com/simonw/datasette/pull/1574#issuecomment-1007844190 https://api.github.com/repos/simonw/datasette/issues/1574 IC_kwDOBm6k_c48Ente fgregg 536941 2022-01-08T00:42:12Z 2022-01-08T00:42:12Z CONTRIBUTOR

is there a reason to not always use the slim option?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
introduce new option for datasette package to use a slim base image 1084193403  
1007636709 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007636709 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48D1Dl fgregg 536941 2022-01-07T18:28:33Z 2022-01-07T18:29:43Z CONTRIBUTOR

i added an index to one table with sqlite-utils, and then a query that used to take about 1 second started taking hundreds of seconds.

running analyze got me back to sub second speed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1002825217 https://github.com/simonw/datasette/issues/1583#issuecomment-1002825217 https://api.github.com/repos/simonw/datasette/issues/1583 IC_kwDOBm6k_c47xeYB fgregg 536941 2021-12-30T00:34:16Z 2021-12-30T00:34:16Z CONTRIBUTOR

if that is not desirable, it might be good to document that users might want to set up a lifecycle rule to automatically delete these build artifacts. something like https://stackoverflow.com/questions/59937542/can-i-delete-container-images-from-google-cloud-storage-artifacts-bucket

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
consider adding deletion step of cloudbuild artifacts to gcloud publish 1090810196  
997128712 https://github.com/simonw/datasette/issues/1561#issuecomment-997128712 https://api.github.com/repos/simonw/datasette/issues/1561 IC_kwDOBm6k_c47bvoI fgregg 536941 2021-12-18T02:35:48Z 2021-12-18T02:35:48Z CONTRIBUTOR

interesting! i love this feature. this + full caching with cloudflare is really super!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
add hash id to "_memory" url if hashed url mode is turned on and crossdb is also turned on 1082765654  
993078038 https://github.com/simonw/datasette/issues/526#issuecomment-993078038 https://api.github.com/repos/simonw/datasette/issues/526 IC_kwDOBm6k_c47MSsW fgregg 536941 2021-12-14T01:46:52Z 2021-12-14T01:46:52Z CONTRIBUTOR

the nested query idea is very nice, and i stole if for my client side paginator. However, it won't do the right thing if the original query orders by random().

If you go the nested query route, maybe raise a 4XX status code if the query has such a clause?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
993014772 https://github.com/simonw/datasette/issues/1553#issuecomment-993014772 https://api.github.com/repos/simonw/datasette/issues/1553 IC_kwDOBm6k_c47MDP0 fgregg 536941 2021-12-13T23:46:18Z 2021-12-13T23:46:18Z CONTRIBUTOR

these headers would also be relevant for json exports of custom queries

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
if csv export is truncated in non streaming mode set informative response header 1079111498  
992986587 https://github.com/simonw/datasette/issues/1553#issuecomment-992986587 https://api.github.com/repos/simonw/datasette/issues/1553 IC_kwDOBm6k_c47L8Xb fgregg 536941 2021-12-13T22:57:04Z 2021-12-13T22:57:04Z CONTRIBUTOR

would also be good if the header said the what the max row limit was

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
if csv export is truncated in non streaming mode set informative response header 1079111498  
992971072 https://github.com/simonw/datasette/issues/526#issuecomment-992971072 https://api.github.com/repos/simonw/datasette/issues/526 IC_kwDOBm6k_c47L4lA fgregg 536941 2021-12-13T22:29:34Z 2021-12-13T22:29:34Z CONTRIBUTOR

just came by to open this issue. would make my data analysis in observable a lot better!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
991754237 https://github.com/simonw/datasette/issues/1549#issuecomment-991754237 https://api.github.com/repos/simonw/datasette/issues/1549 IC_kwDOBm6k_c47HPf9 fgregg 536941 2021-12-11T19:14:39Z 2021-12-11T19:14:39Z CONTRIBUTOR

that option is not available on custom queries.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Redesign CSV export to improve usability 1077620955  
991405755 https://github.com/simonw/sqlite-utils/issues/353#issuecomment-991405755 https://api.github.com/repos/simonw/sqlite-utils/issues/353 IC_kwDOCGYnMM47F6a7 fgregg 536941 2021-12-11T01:38:29Z 2021-12-11T01:38:29Z CONTRIBUTOR

wow! that's awesome! thanks so much, @simonw!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Allow passing a file of code to "sqlite-utils convert" 1077102934  
964205475 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-964205475 https://api.github.com/repos/simonw/sqlite-utils/issues/26 IC_kwDOCGYnMM45eJuj fgregg 536941 2021-11-09T14:31:29Z 2021-11-09T14:31:29Z CONTRIBUTOR

i was just reaching for a tool to do this this morning

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  
954384496 https://github.com/simonw/datasette/pull/1495#issuecomment-954384496 https://api.github.com/repos/simonw/datasette/issues/1495 IC_kwDOBm6k_c444sBw fgregg 536941 2021-10-29T03:07:13Z 2021-10-29T03:07:13Z CONTRIBUTOR

okay @simonw, made the requested changes. tests are running locally. i think this is ready for you to look at again.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Allow routes to have extra options 1033678984  
949604763 https://github.com/simonw/datasette/issues/1284#issuecomment-949604763 https://api.github.com/repos/simonw/datasette/issues/1284 IC_kwDOBm6k_c44mdGb fgregg 536941 2021-10-22T12:54:34Z 2021-10-22T12:54:34Z CONTRIBUTOR

i'm going to take a swing at this today. we'll see.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Feature or Documentation Request: Individual table as home page template 845794436  
893114612 https://github.com/simonw/datasette/issues/1419#issuecomment-893114612 https://api.github.com/repos/simonw/datasette/issues/1419 IC_kwDOBm6k_c41O9j0 fgregg 536941 2021-08-05T02:29:06Z 2021-08-05T02:29:06Z CONTRIBUTOR

there's a lot of complexity here, that's probably not worth addressing. i got what i needed by patching the dockerfile that cloudrun uses to install a newer version of sqlite.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
`publish cloudrun` should deploy a more recent SQLite version 959710008  
892276385 https://github.com/simonw/datasette/issues/1419#issuecomment-892276385 https://api.github.com/repos/simonw/datasette/issues/1419 IC_kwDOBm6k_c41Lw6h fgregg 536941 2021-08-04T00:58:49Z 2021-08-04T00:58:49Z CONTRIBUTOR

yes, filter clause on aggregate queries were added to sqlite3 in 3.30

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
`publish cloudrun` should deploy a more recent SQLite version 959710008  
884910320 https://github.com/simonw/datasette/issues/1401#issuecomment-884910320 https://api.github.com/repos/simonw/datasette/issues/1401 IC_kwDOBm6k_c40vqjw fgregg 536941 2021-07-22T13:26:01Z 2021-07-22T13:26:01Z CONTRIBUTOR

ordered lists didn't work either, btw

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
unordered list is not rendering bullet points in description_html on database page 950664971  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1.2ms · About: github-to-sqlite