github

This data as json, CSV

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	pull_request	body	repo	type	active_lock_reason	performed_via_github_app	reactions	draft	state_reason
944846776	MDU6SXNzdWU5NDQ4NDY3NzY=	297	Option for importing CSV data using the SQLite .import mechanism	9599	open	0			23	2021-07-14T22:36:41Z	2023-09-22T20:49:52Z		OWNER		As seen in https://til.simonwillison.net/sqlite/import-csv - `.mode csv` and then `.import school.csv schools` is hugely faster than importing via `sqlite-utils insert` and doing the work in Python - but it can only be implemented by shelling out to the `sqlite3` CLI tool, it's not functionality that is exposed to the Python `sqlite3` module. An option to use this would be useful - maybe something like this: sqlite-utils insert blah.db blah blah.csv --fast	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/297/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }
944326512	MDU6SXNzdWU5NDQzMjY1MTI=	296	`table.search(..., quote=True)` parameter and `sqlite-utils search --quote` option	32427188	closed	0			6	2021-07-14T11:26:47Z	2021-08-18T20:13:12Z	2021-08-18T20:10:48Z	NONE		Hi, Recently got this error: ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ethan/git/music-metadata-indexer/src/mmindexer/__init__.py", line 38, in <module> start("/home/ethan/git/music-metadata-indexer/sample", "/home/ethan/git/music-metadata-indexer/test.db") File "/home/ethan/git/music-metadata-indexer/src/mmindexer/__init__.py", line 23, in start scanner.build_database() File "/home/ethan/git/music-metadata-indexer/src/mmindexer/scan.py", line 79, in build_database _import_song(self.db, Path(dirpath).joinpath(f), self.logger) File "/home/ethan/git/music-metadata-indexer/src/mmindexer/scan.py", line 23, in _import_song db.add_song(filepath) File "/home/ethan/git/music-metadata-indexer/src/mmindexer/index.py", line 166, in add_song for match in self.search("albums", album): File "/home/ethan/git/music-metadata-indexer/env/lib/python3.9/site-packages/sqlite_utils/db.py", line 1625, in search cursor = self.db.execute( File "/home/ethan/git/music-metadata-indexer/env/lib/python3.9/site-packages/sqlite_utils/db.py", line 243, in execute return self.conn.execute(sql, parameters) sqlite3.OperationalError: fts5: syntax error near "." ``` So, the error seems to suggest there was a "." character somewhere in the SQL command that was causing the error. I did a little digging and found this in the docs: https://www.sqlite.org/fts5.html#fts5_strings. "." is one of the many prohibited characters. My solution was to just strip these out of the query using this line `query = query.translate({e: None for e in itertools.chain(range(0,26), range(27, 48), range(58,65), range(91,95), [96], range(123,128))})` Perhaps this could be included into the `table.search()` function?	140912432	issue			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/296/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed
944870799	MDU6SXNzdWU5NDQ4NzA3OTk=	1394	Big performance boost on faceting: skip the inner order by	9599	closed	0			4	2021-07-14T23:32:29Z	2021-07-16T02:23:32Z	2021-07-15T00:05:50Z	OWNER		I just noticed something that could make for a huge performance improvement in faceting. The default query used by Datasette when faceting looks like this: ```sql select country_long, count() from ( select from [global-power-plants] order by rowid ) where country_long is not null group by country_long order by count() desc ``` Here it takes 53ms: https://global-power-plants.datasettes.com/global-power-plants?sql=select%0D%0A++country_long%2C%0D%0A++count%28%29%0D%0Afrom+%28%0D%0A++select++from+%5Bglobal-power-plants%5D+order+by+rowid%0D%0A%29%0D%0Awhere%0D%0A++country_long+is+not+null%0D%0Agroup+by%0D%0A++country_long%0D%0Aorder+by%0D%0A++count%28%29+desc Note that there's a `order by rowid` in there which isn't necessary - the order on that inner query doesn't matter since we're grouping and counting. I had assumed SQLite would optimize this away - but it turns out it doesn't! Consider this version of the query, with that pointless order by removed: ``` select country_long, count() from ( select from [global-power-plants] ) where country_long is not null group by country_long order by count() desc ``` https://global-power-plants.datasettes.com/global-power-plants?sql=select%0D%0A++country_long%2C%0D%0A++count%28%29%0D%0Afrom+%28%0D%0A++select++from+%5Bglobal-power-plants%5D%0D%0A%29%0D%0Awhere%0D%0A++country_long+is+not+null%0D%0Agroup+by%0D%0A++country_long%0D%0Aorder+by%0D%0A++count%28%29+desc runs in 7.2ms! I tried this optimization on a table with 2.5m rows in it - without the optimization it took 5 seconds, with the optimization it took 450ms. So this is a very significant improvement!	107914493	issue			{ "url": "https://api.github.com/repos/simonw/datasette/issues/1394/reactions", "total_count": 2, "+1": 1, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed