github
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1162231111 | https://api.github.com/repos/simonw/sqlite-utils/issues/297 | 1162231111 | IC_kwDOCGYnMM5FRj1H | 9599 | 2022-06-21T19:25:44Z | 2022-06-21T19:25:44Z | OWNER | Pushed that prototype to a branch. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
944846776 | |
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1162223668 | https://api.github.com/repos/simonw/sqlite-utils/issues/297 | 1162223668 | IC_kwDOCGYnMM5FRiA0 | 9599 | 2022-06-21T19:19:22Z | 2022-06-21T19:22:15Z | OWNER | Built a prototype of `--fast` for the `sqlite-utils memory` command: ``` % time sqlite-utils memory taxi.csv 'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count' --fast passenger_count COUNT(*) AVG(total_amount) --------------- -------- ----------------- 128020 32.2371511482553 0 42228 17.0214016766151 1 1533197 17.6418833067999 2 286461 18.0975870711456 3 72852 17.9153958710923 4 25510 18.452774990196 5 50291 17.2709248175672 6 32623 17.6002964166367 7 2 87.17 8 2 95.705 9 1 113.6 sqlite-utils memory taxi.csv --fast 12.71s user 0.48s system 104% cpu 12.627 total ``` Takes 13s - about the same time as calling `sqlite3 :memory: ...` directly as seen in https://til.simonwillison.net/sqlite/one-line-csv-operations Without the `--fast` option that takes several minutes (262s = 4m20s)! Here's the prototype so far: ```diff diff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py index 86eddfb..1c83ef6 100644 --- a/sqlite_utils/cli.py +++ b/sqlite_utils/cli.py @@ -14,6 +14,8 @@ import io import itertools import json import os +import shutil +import subprocess import sys import csv as csv_std import tabulate @@ -1669,6 +1671,7 @@ def query( is_flag=True, help="Analyze resulting tables and output results", ) +@click.option("--fast", is_flag=True, help="Fast mode, only works with CSV and TSV") @load_extension_option def memory( paths, @@ -1692,6 +1695,7 @@ def memory( save, analyze, load_extension, + fast, ): """Execute SQL query against an in-memory database, optionally populated by imported data @@ -1719,6 +1723,22 @@ def memory( \b sqlite-utils memory animals.csv --schema """ + if fast: + … | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
944846776 | |
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1162179354 | https://api.github.com/repos/simonw/sqlite-utils/issues/297 | 1162179354 | IC_kwDOCGYnMM5FRXMa | 9599 | 2022-06-21T18:44:03Z | 2022-06-21T18:44:03Z | OWNER | The thing I like about that `--fast` option is that it could selectively use this alternative mechanism just for the files for which it can work (CSV and TSV files). I could also add a `--fast` option to `sqlite-utils memory` which could then kick in only for operations that involve just TSV and CSV files. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
944846776 | |
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1161849874 | https://api.github.com/repos/simonw/sqlite-utils/issues/297 | 1161849874 | IC_kwDOCGYnMM5FQGwS | 9599 | 2022-06-21T14:49:12Z | 2022-06-21T14:49:12Z | OWNER | Since there are all sorts of existing options for `sqlite-utils insert` that won't work with this, maybe it would be better to have an entirely separate command - this for example: sqlite-utils fast-insert data.db mytable data.csv | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
944846776 |