github
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | pull_request | body | repo | type | active_lock_reason | performed_via_github_app | reactions | draft | state_reason |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
944846776 | MDU6SXNzdWU5NDQ4NDY3NzY= | 297 | Option for importing CSV data using the SQLite .import mechanism | 9599 | open | 0 | 23 | 2021-07-14T22:36:41Z | 2023-09-22T20:49:52Z | OWNER | As seen in https://til.simonwillison.net/sqlite/import-csv - `.mode csv` and then `.import school.csv schools` is hugely faster than importing via `sqlite-utils insert` and doing the work in Python - but it can only be implemented by shelling out to the `sqlite3` CLI tool, it's not functionality that is exposed to the Python `sqlite3` module. An option to use this would be useful - maybe something like this: sqlite-utils insert blah.db blah blah.csv --fast | 140912432 | issue | { "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/297/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
||||||||
944326512 | MDU6SXNzdWU5NDQzMjY1MTI= | 296 | `table.search(..., quote=True)` parameter and `sqlite-utils search --quote` option | 32427188 | closed | 0 | 6 | 2021-07-14T11:26:47Z | 2021-08-18T20:13:12Z | 2021-08-18T20:10:48Z | NONE | Hi, Recently got this error: ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ethan/git/music-metadata-indexer/src/mmindexer/__init__.py", line 38, in <module> start("/home/ethan/git/music-metadata-indexer/sample", "/home/ethan/git/music-metadata-indexer/test.db") File "/home/ethan/git/music-metadata-indexer/src/mmindexer/__init__.py", line 23, in start scanner.build_database() File "/home/ethan/git/music-metadata-indexer/src/mmindexer/scan.py", line 79, in build_database _import_song(self.db, Path(dirpath).joinpath(f), self.logger) File "/home/ethan/git/music-metadata-indexer/src/mmindexer/scan.py", line 23, in _import_song db.add_song(filepath) File "/home/ethan/git/music-metadata-indexer/src/mmindexer/index.py", line 166, in add_song for match in self.search("albums", album): File "/home/ethan/git/music-metadata-indexer/env/lib/python3.9/site-packages/sqlite_utils/db.py", line 1625, in search cursor = self.db.execute( File "/home/ethan/git/music-metadata-indexer/env/lib/python3.9/site-packages/sqlite_utils/db.py", line 243, in execute return self.conn.execute(sql, parameters) sqlite3.OperationalError: fts5: syntax error near "." ``` So, the error seems to suggest there was a "." character somewhere in the SQL command that was causing the error. I did a little digging and found this in the docs: https://www.sqlite.org/fts5.html#fts5_strings. "." is one of the many prohibited characters. My solution was to just strip these out of the query using this line `query = query.translate({e: None for e in itertools.chain(range(0,26), range(27, 48), range(58,65), range(91,95), [96], range(123,128))})` Perhaps this could be included into the `table.search()` function? | 140912432 | issue | { "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/296/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | ||||||
944870799 | MDU6SXNzdWU5NDQ4NzA3OTk= | 1394 | Big performance boost on faceting: skip the inner order by | 9599 | closed | 0 | 4 | 2021-07-14T23:32:29Z | 2021-07-16T02:23:32Z | 2021-07-15T00:05:50Z | OWNER | I just noticed something that could make for a huge performance improvement in faceting. The default query used by Datasette when faceting looks like this: ```sql select country_long, count(*) from ( select * from [global-power-plants] order by rowid ) where country_long is not null group by country_long order by count(*) desc ``` Here it takes 53ms: https://global-power-plants.datasettes.com/global-power-plants?sql=select%0D%0A++country_long%2C%0D%0A++count%28*%29%0D%0Afrom+%28%0D%0A++select+*+from+%5Bglobal-power-plants%5D+order+by+rowid%0D%0A%29%0D%0Awhere%0D%0A++country_long+is+not+null%0D%0Agroup+by%0D%0A++country_long%0D%0Aorder+by%0D%0A++count%28*%29+desc Note that there's a `order by rowid` in there which isn't necessary - the order on that inner query doesn't matter since we're grouping and counting. I had assumed SQLite would optimize this away - but it turns out it doesn't! Consider this version of the query, with that pointless order by removed: ``` select country_long, count(*) from ( select * from [global-power-plants] ) where country_long is not null group by country_long order by count(*) desc ``` https://global-power-plants.datasettes.com/global-power-plants?sql=select%0D%0A++country_long%2C%0D%0A++count%28*%29%0D%0Afrom+%28%0D%0A++select+*+from+%5Bglobal-power-plants%5D%0D%0A%29%0D%0Awhere%0D%0A++country_long+is+not+null%0D%0Agroup+by%0D%0A++country_long%0D%0Aorder+by%0D%0A++count%28*%29+desc runs in 7.2ms! I tried this optimization on a table with 2.5m rows in it - without the optimization it took 5 seconds, with the optimization it took 450ms. So this is a very significant improvement! | 107914493 | issue | { "url": "https://api.github.com/repos/simonw/datasette/issues/1394/reactions", "total_count": 2, "+1": 1, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed |