github
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1247161510 | https://api.github.com/repos/simonw/sqlite-utils/issues/297 | 1247161510 | IC_kwDOCGYnMM5KViym | 9599 | 2022-09-14T18:39:50Z | 2022-09-14T18:39:50Z | OWNER | Wrote that up as a TIL: https://til.simonwillison.net/python/pypy-macos | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
944846776 | |
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1247149969 | https://api.github.com/repos/simonw/sqlite-utils/issues/297 | 1247149969 | IC_kwDOCGYnMM5KVf-R | 9599 | 2022-09-14T18:28:53Z | 2022-09-14T18:29:34Z | OWNER | As an aside, https://avi.im/blag/2021/fast-sqlite-inserts/ inspired my to try pypy since that article claimed to get a 2.5x speedup using pypy compared to regular Python for a CSV import script. Setup: ``` brew install pypy3 cd /tmp pypy3 -m venv venv source venv/bin/activate pip install sqlite-utils ``` I grabbed the first 760M of that `https://static.openfoodfacts.org/data/en.openfoodfacts.org.products.csv` file (didn't wait for the whole thing to download). Then: ``` time sqlite-utils insert pypy.db t en.openfoodfacts.org.products.csv --csv [------------------------------------] 0% [###################################-] 99% 11.76s user 2.26s system 93% cpu 14.981 total ``` Compared to regular Python `sqlite-utils` doing the same thing: ``` time sqlite-utils insert py.db t en.openfoodfacts.org.products.csv --csv [------------------------------------] 0% [###################################-] 99% 11.36s user 2.06s system 93% cpu 14.341 total ``` So no perceivable performance difference. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
944846776 | |
https://github.com/simonw/sqlite-utils/issues/297#issuecomment-1246978641 | https://api.github.com/repos/simonw/sqlite-utils/issues/297 | 1246978641 | IC_kwDOCGYnMM5KU2JR | 9599 | 2022-09-14T15:57:41Z | 2022-09-14T15:57:41Z | OWNER | One solution suggested on Discord: ``` wget https://static.openfoodfacts.org/data/en.openfoodfacts.org.products.csv CREATE=`curl -s -L https://gist.githubusercontent.com/CharlesNepote/80fb813a416ad445fdd6e4738b4c8156/raw/032af70de631ff1c4dd09d55360f242949dcc24f/create.sql` INDEX=`curl -s -L https://gist.githubusercontent.com/CharlesNepote/80fb813a416ad445fdd6e4738b4c8156/raw/032af70de631ff1c4dd09d55360f242949dcc24f/index.sql` time sqlite3 products_new.db <<EOS /* Optimisations. See: https://avi.im/blag/2021/fast-sqlite-inserts/ */; PRAGMA page_size = 32768; $CREATE .mode ascii .separator "\t" "\n" .import --skip 1 en.openfoodfacts.org.products.csv all $INDEX EOS # Converting empty to NULL for columns which are either FLOAT or INTEGER time sqlite3 products.db ".schema all" | sed -nr 's/.*\[(.*)\] (INTEGER|FLOAT).*/\1/gp' | xargs -I % sqlite3 products.db -cmd "PRAGMA journal_mode=OFF;" "UPDATE [all] SET [%] = NULL WHERE [%] = '';" ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
944846776 |