github

This data as json, CSV

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	pull_request	body	repo	type	active_lock_reason	performed_via_github_app	reactions	draft	state_reason
1084193403	PR_kwDOBm6k_c4wDKmb	1574	introduce new option for datasette package to use a slim base image	33631	closed	0			6	2021-12-19T21:18:19Z	2022-08-15T08:49:31Z	2022-08-15T08:49:31Z	NONE	simonw/datasette/pulls/1574	The official python images on docker hub come with a slim variant that is significantly smaller than the default. The diff does not change the default, but allows to switch to the `slim` variant with commandline switch (`--slim-base-image`) Size comparison: ``` $ datasette package some.db -t fat --install "datasette-basemap datasette-cluster-map" $ datasette package some.db -t slim --slim-base-image --install "datasette-basemap datasette-cluster-map" $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE fat latest 807b393ace0d 9 seconds ago 978MB slim latest 31bc5e63505c 8 minutes ago 191MB ```	107914493	pull			{ "url": "https://api.github.com/repos/simonw/datasette/issues/1574/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	0
395236066	MDU6SXNzdWUzOTUyMzYwNjY=	393	CSV export in "Advanced export" pane doesn't respect query	1727065	closed	0			6	2019-01-02T12:39:41Z	2021-06-17T18:14:24Z	2019-01-03T02:44:10Z	NONE		It looks like there's an inconsistency when exporting to CSV via the the web interface. Say I'm looking at [songs released in 1989](https://fivethirtyeight.datasettes.com/fivethirtyeight-c300360/classic-rock%2Fclassic-rock-song-list?Release+Year__exact=1989) in the `classic-rock/classic-rock-song-list` table from the Five Thirty Eight data. The JSON and CSV export links at the top of the page both give me filtered data using `Release+Year__exact=1989` in the URL. In the `Advanced export` tab, though, the CSV option gives me the whole data set, while the JSON options preserve the query. It may be that this is intended behaviour related to the streaming CSV stuff [discussed here](https://github.com/simonw/datasette/issues/266), but if that's the case then I think it should be a little clearer.	107914493	issue			{ "url": "https://api.github.com/repos/simonw/datasette/issues/393/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed
826613352	MDExOlB1bGxSZXF1ZXN0NTg4NjAxNjI3	1254	Update Docker Spatialite version to 5.0.1 + add support for Spatialite topology functions	3200608	closed	0			6	2021-03-09T20:49:08Z	2021-03-10T18:27:45Z	2021-03-09T22:04:23Z	NONE	simonw/datasette/pulls/1254	This requires adding the RT Topology library (Spatialite changed to RT Topology from LWGEOM between 4.4 and 5.0), as well as upgrading the GEOS version (which is the reason for switching to `python:3.7.10-slim-buster` as the base image.) `autoconf` and `libtool` are added to build RT Topology, and Spatialite is now built with `--disable-minizip` (minizip wasn't an option in 4.4 and I didn't want to add another dependency) and `--disable-dependency-tracking` which, according to Spatialite, "speeds up one-time builds"	107914493	pull			{ "url": "https://api.github.com/repos/simonw/datasette/issues/1254/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	0
573583971	MDU6SXNzdWU1NzM1ODM5NzE=	689	"Templates considered" comment broken in >=0.35	35075	closed	0			6	2020-03-01T17:31:21Z	2020-04-05T19:39:44Z	2020-04-05T19:39:44Z	NONE		Noticed that the "Templates Considered" comment is missing in 0.37. Believe I traced it back to #664 as you can see it in https://v0-34.datasette.io/ but not https://v0-35.datasette.io/. Looking at the template context debug between the two you can see what is missing from 0.35 vs. 0.34: ```diff < "datasette_version": "0.34", < "app_css_hash": "ffa51a", < "select_templates": [ < "*index.html" < ], < "zip": "<class 'zip'>", < "body_scripts": [], < "extra_css_urls": "<generator object BaseView._asset_urls at 0x7f6529ac05f0>", < "extra_js_urls": "<generator object BaseView._asset_urls at 0x7f6529ac0660>", < "format_bytes": "<function format_bytes at 0x7f652a1588b0>", < "database_url": "<bound method BaseView.database_url of <datasette.views.index.IndexView object at 0x7f6529b03e50>>", < "database_color": "<bound method BaseView.database_color of <datasette.views.index.IndexView object at 0x7f6529b03e50>>" --- > "datasette_version": "0.35", > "database_url": "<bound method BaseView.database_url of <datasette.views.index.IndexView object at 0x7f6140dacd90>>", > "database_color": "<bound method BaseView.database_color of <datasette.views.index.IndexView object at 0x7f6140dacd90>>" ```	107914493	issue			{ "url": "https://api.github.com/repos/simonw/datasette/issues/689/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed
512996469	MDU6SXNzdWU1MTI5OTY0Njk=	607	Ways to improve fuzzy search speed on larger data sets?	8431341	closed	0			6	2019-10-27T17:31:37Z	2019-11-07T03:38:10Z	2019-11-07T03:38:10Z	NONE		I have an sqlite table with 16 million rows in it. Having read @simonw article "[Fast Autocomplete Search for Your Website](https://24ways.org/2018/fast-autocomplete-search-for-your-website/)" I was curious to try datasette to see what kind of query performance I could get out of it. In truth I don't need to do full text search since all I would like to do is give my users a way to search for the names of investors such as "Warren Buffet", or "Tim Cook" (who's names are in a single column). On the first search, Datasette takes over 20 seconds to return all records associated with `elon musk`: > ![image](https://user-images.githubusercontent.com/8431341/67638889-a86e1100-f8b7-11e9-9f7e-a9d13a42e988.png) > ![image](https://user-images.githubusercontent.com/8431341/67638825-ed457800-f8b6-11e9-94d1-b44f1a40ee8c.png) If I rerun the same search, it then takes almost 9 seconds: > ![image](https://user-images.githubusercontent.com/8431341/67638908-e4a17180-f8b7-11e9-9d00-748c80ef1f21.png) That's far to slow to implement an autocomplete feature. I could reduce the latency by making a special table of only unique investor names, thereby reducing the search space to less than a million rows (then I'd need to implement a way to add only new investor names to the table as I received new data.. about 4,000 rows a day). If I did that, I'm still concerned the new table wouldn't be lean enough to lookup investor names quickly. Plus, even if I can implement the autocomplete feature, I would still finally have to lookup records for that investors which would take between 8 - 20 seconds. Are there any tricks for speeding this up? Here's my hardware: > ![image](https://user-images.githubusercontent.com/8431341/67638861-55945980-f8b7-11e9-96a8-ca76c7c68c5d.png)	107914493	issue			{ "url": "https://api.github.com/repos/simonw/datasette/issues/607/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed