github

This data as json, CSV

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	issue
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262920929	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1262920929	IC_kwDOCGYnMM5LRqTh	9599	2022-09-29T23:06:44Z	2022-09-29T23:06:44Z	OWNER	Currently the only other use of `-t` is for this: ``` -t, --table Output as a formatted table ``` So I think it's OK to use it to mean something slightly different for this command, since `sqlite-utils insert` doesn't do any output of data in any format.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262918833	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1262918833	IC_kwDOCGYnMM5LRpyx	9599	2022-09-29T23:02:52Z	2022-09-29T23:02:52Z	OWNER	The other nice thing about having this as a separate command is that I can implement a tiny subset of the overall `sqlite-utils insert` features at first, and then add additional features in subsequent releases.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262917059	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1262917059	IC_kwDOCGYnMM5LRpXD	9599	2022-09-29T22:59:28Z	2022-09-29T22:59:28Z	OWNER	I quite like `sqlite-utils fast-csv` - I think it's clear enough what it does, and running `--help` can clarify if needed.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262915322	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1262915322	IC_kwDOCGYnMM5LRo76	9599	2022-09-29T22:57:31Z	2022-09-29T22:57:42Z	OWNER	Maybe `sqlite-utils fast-csv` is right? Not entirely clear that's an insert though as opposed to a faster version of in-memory querying in the style of `sqlite-utils memory`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262914416	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1262914416	IC_kwDOCGYnMM5LRotw	9599	2022-09-29T22:56:53Z	2022-09-29T22:56:53Z	OWNER	Potential names/designs: - `sqlite-utils fast data.db rows rows.csv` - `sqlite-utils insert-fast data.db rows rows.csv` - `sqlite-utils fast-csv data.db rows rows.csv` Or more interestingly... what if it could accept multiple CSV files to create multiple tables? - `sqlite-utils fast data.db rows.csv other.csv` Would still need to support creating tables with different names though. Maybe like this: - `sqlite-utils fast data.db -t mytable rows.csv -t othertable other.csv` I seem to be leaning towards `fast` as the command name, but as a standalone command name it's a bit meaningless - how do we know that's about CSV import and not about fast querying or similar?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1262913145	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1262913145	IC_kwDOCGYnMM5LRoZ5	9599	2022-09-29T22:54:13Z	2022-09-29T22:54:13Z	OWNER	After reviewing `sqlite-utils insert --help` I'm confident that MOST of these options wouldn't make sense for a "fast" moder that just supports CSV and works by piping directly to the `sqlite3` binary: https://github.com/simonw/sqlite-utils/blob/d792dad1cf5f16525da81b1e162fb71d469995f3/docs/cli-reference.rst#L251-L279 I'm going to implement a separate command instead.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1247161510	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1247161510	IC_kwDOCGYnMM5KViym	9599	2022-09-14T18:39:50Z	2022-09-14T18:39:50Z	OWNER	Wrote that up as a TIL: https://til.simonwillison.net/python/pypy-macos	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1247149969	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1247149969	IC_kwDOCGYnMM5KVf-R	9599	2022-09-14T18:28:53Z	2022-09-14T18:29:34Z	OWNER	As an aside, https://avi.im/blag/2021/fast-sqlite-inserts/ inspired my to try pypy since that article claimed to get a 2.5x speedup using pypy compared to regular Python for a CSV import script. Setup: ``` brew install pypy3 cd /tmp pypy3 -m venv venv source venv/bin/activate pip install sqlite-utils ``` I grabbed the first 760M of that `https://static.openfoodfacts.org/data/en.openfoodfacts.org.products.csv` file (didn't wait for the whole thing to download). Then: ``` time sqlite-utils insert pypy.db t en.openfoodfacts.org.products.csv --csv [------------------------------------] 0% [###################################-] 99% 11.76s user 2.26s system 93% cpu 14.981 total ``` Compared to regular Python `sqlite-utils` doing the same thing: ``` time sqlite-utils insert py.db t en.openfoodfacts.org.products.csv --csv [------------------------------------] 0% [###################################-] 99% 11.36s user 2.06s system 93% cpu 14.341 total ``` So no perceivable performance difference.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1246978641	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1246978641	IC_kwDOCGYnMM5KU2JR	9599	2022-09-14T15:57:41Z	2022-09-14T15:57:41Z	OWNER	One solution suggested on Discord: ``` wget https://static.openfoodfacts.org/data/en.openfoodfacts.org.products.csv CREATE=`curl -s -L https://gist.githubusercontent.com/CharlesNepote/80fb813a416ad445fdd6e4738b4c8156/raw/032af70de631ff1c4dd09d55360f242949dcc24f/create.sql` INDEX=`curl -s -L https://gist.githubusercontent.com/CharlesNepote/80fb813a416ad445fdd6e4738b4c8156/raw/032af70de631ff1c4dd09d55360f242949dcc24f/index.sql` time sqlite3 products_new.db <<EOS /* Optimisations. See: https://avi.im/blag/2021/fast-sqlite-inserts/ /; PRAGMA page_size = 32768; $CREATE .mode ascii .separator "\t" "\n" .import --skip 1 en.openfoodfacts.org.products.csv all $INDEX EOS # Converting empty to NULL for columns which are either FLOAT or INTEGER time sqlite3 products.db ".schema all" \| sed -nr 's/.\[(.)\] (INTEGER\|FLOAT)./\1/gp' \| xargs -I % sqlite3 products.db -cmd "PRAGMA journal_mode=OFF;" "UPDATE [all] SET [%] = NULL WHERE [%] = '';" ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1246977989	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1246977989	IC_kwDOCGYnMM5KU1_F	9599	2022-09-14T15:57:09Z	2022-09-14T15:57:09Z	OWNER	Should consider how this could best handle creating columns that are integer and float as opposed to just text. https://discord.com/channels/823971286308356157/823971286941302908/1019630014544748584 is a relevant discussion on Discord. Even if you create the schema in advance with the correct column types, this import mechanism can put empty strings in blank float/integer columns when ideally you would want to have nulls. Related feature idea for `sqlite-utils transform`: - #488 Not sure how best to handle this for `sqlite3 .import` imports.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1240882245	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1240882245	IC_kwDOCGYnMM5J9lxF	9599	2022-09-08T15:33:11Z	2022-09-08T15:33:11Z	OWNER	The more I think about this the more I like it - particularly for `sqlite-utils fast-insert` where differences in features aren't a problem. I used a variant of this trick with parquet files here: https://simonwillison.net/2022/Sep/5/laion-aesthetics-weeknotes/	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1162231111	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1162231111	IC_kwDOCGYnMM5FRj1H	9599	2022-06-21T19:25:44Z	2022-06-21T19:25:44Z	OWNER	Pushed that prototype to a branch.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1162223668	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1162223668	IC_kwDOCGYnMM5FRiA0	9599	2022-06-21T19:19:22Z	2022-06-21T19:22:15Z	OWNER	Built a prototype of `--fast` for the `sqlite-utils memory` command: ``` % time sqlite-utils memory taxi.csv 'SELECT passenger_count, COUNT(), AVG(total_amount) FROM taxi GROUP BY passenger_count' --fast passenger_count COUNT() AVG(total_amount) --------------- -------- ----------------- 128020 32.2371511482553 0 42228 17.0214016766151 1 1533197 17.6418833067999 2 286461 18.0975870711456 3 72852 17.9153958710923 4 25510 18.452774990196 5 50291 17.2709248175672 6 32623 17.6002964166367 7 2 87.17 8 2 95.705 9 1 113.6 sqlite-utils memory taxi.csv --fast 12.71s user 0.48s system 104% cpu 12.627 total ``` Takes 13s - about the same time as calling `sqlite3 :memory: ...` directly as seen in https://til.simonwillison.net/sqlite/one-line-csv-operations Without the `--fast` option that takes several minutes (262s = 4m20s)! Here's the prototype so far: ```diff diff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py index 86eddfb..1c83ef6 100644 --- a/sqlite_utils/cli.py +++ b/sqlite_utils/cli.py @@ -14,6 +14,8 @@ import io import itertools import json import os +import shutil +import subprocess import sys import csv as csv_std import tabulate @@ -1669,6 +1671,7 @@ def query( is_flag=True, help="Analyze resulting tables and output results", ) +@click.option("--fast", is_flag=True, help="Fast mode, only works with CSV and TSV") @load_extension_option def memory( paths, @@ -1692,6 +1695,7 @@ def memory( save, analyze, load_extension, + fast, ): """Execute SQL query against an in-memory database, optionally populated by imported data @@ -1719,6 +1723,22 @@ def memory( \b sqlite-utils memory animals.csv --schema """ + if fast: + …	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1162179354	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1162179354	IC_kwDOCGYnMM5FRXMa	9599	2022-06-21T18:44:03Z	2022-06-21T18:44:03Z	OWNER	The thing I like about that `--fast` option is that it could selectively use this alternative mechanism just for the files for which it can work (CSV and TSV files). I could also add a `--fast` option to `sqlite-utils memory` which could then kick in only for operations that involve just TSV and CSV files.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1161849874	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1161849874	IC_kwDOCGYnMM5FQGwS	9599	2022-06-21T14:49:12Z	2022-06-21T14:49:12Z	OWNER	Since there are all sorts of existing options for `sqlite-utils insert` that won't work with this, maybe it would be better to have an entirely separate command - this for example: sqlite-utils fast-insert data.db mytable data.csv	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-882052693	https://api.github.com/repos/simonw/sqlite-utils/issues/297	882052693	IC_kwDOCGYnMM40kw5V	9599	2021-07-18T12:57:54Z	2022-06-21T13:17:15Z	OWNER	Another implementation option would be to use the CSV virtual table mechanism. This could avoid shelling out to the `sqlite3` binary, but requires solving the harder problem of compiling and distributing a loadable SQLite module: https://www.sqlite.org/csv.html (Would be neat to produce a Python wheel of this, see https://simonwillison.net/2022/May/23/bundling-binary-tools-in-python-wheels/) This would also help solve the challenge of making this optimization available to the `sqlite-utils memory` command. That command operates against an in-memory database so it's not obvious how it could shell out to a binary.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1160991031	https://api.github.com/repos/simonw/sqlite-utils/issues/297	1160991031	IC_kwDOCGYnMM5FM1E3	9599	2022-06-21T00:35:20Z	2022-06-21T00:35:20Z	OWNER	Relevant TIL: https://til.simonwillison.net/sqlite/one-line-csv-operations	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-882052852	https://api.github.com/repos/simonw/sqlite-utils/issues/297	882052852	IC_kwDOCGYnMM40kw70	9599	2021-07-18T12:59:20Z	2021-07-18T12:59:20Z	OWNER	I'm not too worried about `sqlite-utils memory` because if your data is large enough that you can benefit from this optimization you probably should use a real file as opposed to a disposable memory database when analyzing it.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-880259255	https://api.github.com/repos/simonw/sqlite-utils/issues/297	880259255	MDEyOklzc3VlQ29tbWVudDg4MDI1OTI1NQ==	9599	2021-07-14T22:48:41Z	2021-07-14T22:48:41Z	OWNER	Should also take advantage of `.mode tabs` to support `sqlite-utils insert blah.db blah blah.csv --tsv --fast`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-880257587	https://api.github.com/repos/simonw/sqlite-utils/issues/297	880257587	MDEyOklzc3VlQ29tbWVudDg4MDI1NzU4Nw==	9599	2021-07-14T22:44:05Z	2021-07-14T22:44:05Z	OWNER	https://unix.stackexchange.com/a/642364 suggests you can also use this to import from stdin, like so: sqlite3 -csv $database_file_name ".import '\|cat -' $table_name" Here the `sqlite3 -csv` is an alternative to using `.mode csv`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-880256865	https://api.github.com/repos/simonw/sqlite-utils/issues/297	880256865	MDEyOklzc3VlQ29tbWVudDg4MDI1Njg2NQ==	9599	2021-07-14T22:42:11Z	2021-07-14T22:42:11Z	OWNER	Potential workaround for missing `--skip` implementation is that the filename can be a command instead, so maybe it could shell out to `tail -n +1 filename`: > The source argument is the name of a file to be read or, if it begins with a "\|" character, specifies a command which will be run to produce the input CSV data.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-880256058	https://api.github.com/repos/simonw/sqlite-utils/issues/297	880256058	MDEyOklzc3VlQ29tbWVudDg4MDI1NjA1OA==	9599	2021-07-14T22:40:01Z	2021-07-14T22:40:47Z	OWNER	Full docs here: https://www.sqlite.org/draft/cli.html#csv One catch: how this works has changed in recent SQLite versions: https://www.sqlite.org/changes.html - 2020-12-01 (3.34.0) - "Table name quoting works correctly for the .import dot-command" - 2020-05-22 (3.32.0) - "Add options to the .import command: --csv, --ascii, --skip" - 2017-08-01 (3.20.0) - " The ".import" command ignores an initial UTF-8 BOM." The "skip" feature is particularly important to understand. https://www.sqlite.org/draft/cli.html#csv says: > There are two cases to consider: (1) Table "tab1" does not previously exist and (2) table "tab1" does already exist. > > In the first case, when the table does not previously exist, the table is automatically created and the content of the first row of the input CSV file is used to determine the name of all the columns in the table. In other words, if the table does not previously exist, the first row of the CSV file is interpreted to be column names and the actual data starts on the second row of the CSV file. > > For the second case, when the table already exists, every row of the CSV file, including the first row, is assumed to be actual content. If the CSV file contains an initial row of column labels, you can cause the .import command to skip that initial row using the "--skip 1" option. But the `--skip 1` option is only available in 3.32.0 and higher.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	944846776

github

Custom SQL query returning 22 rows (hide)

Query parameters