github

This data as json, CSV

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	issue	performed_via_github_app
https://github.com/simonw/datasette/issues/266#issuecomment-389570841	https://api.github.com/repos/simonw/datasette/issues/266	389570841	MDEyOklzc3VlQ29tbWVudDM4OTU3MDg0MQ==	9599	2018-05-16T15:54:49Z	2018-06-15T07:41:09Z	OWNER	At the most basic level, this will work based on an extension. Most places you currently put a `.json` extension should also allow a `.csv` extension. By default this will return the exact results you see on the current page (default max will remain 1000). ## Streaming all records Where things get interested is streaming mode. This will be an option which returns ALL matching records as a streaming CSV file, even if that ends up being millions of records. I think the best way to build this will be on top of the existing mechanism used to efficiently implement keyset pagination via `_next=` tokens. ## Expanding foreign keys For tables with foreign key references it would be useful if the CSV format could expand those references to include the labels from `label_column` - maybe via an additional `?_expand=1` option. When expanding each foreign key column will be shown twice: rowid,city_id,city_id_label,state	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	323681589
https://github.com/simonw/datasette/issues/266#issuecomment-389626715	https://api.github.com/repos/simonw/datasette/issues/266	389626715	MDEyOklzc3VlQ29tbWVudDM4OTYyNjcxNQ==	9599	2018-05-16T18:50:46Z	2018-05-16T18:50:46Z	OWNER	> I’d recommend using the Windows-1252 encoding for maximum compatibility, unless you have any characters not in that set, in which case use UTF8 with a byte order mark. Bit of a pain, but some progams (eg various versions of Excel) don’t read UTF8. frankieroberto https://twitter.com/frankieroberto/status/996823071947460616 > There is software that consumes CSV and doesn't speak UTF8!? Huh. Well I can't just use Windows-1252 because I need to support the full UTF8 range of potential data - maybe I should support an optional ?_encoding=windows-1252 argument simonw https://twitter.com/simonw/status/996824677245857793	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	323681589
https://github.com/simonw/datasette/issues/266#issuecomment-389608473	https://api.github.com/repos/simonw/datasette/issues/266	389608473	MDEyOklzc3VlQ29tbWVudDM4OTYwODQ3Mw==	9599	2018-05-16T17:52:35Z	2018-05-16T17:54:11Z	OWNER	There are some code examples in this issue which should help with the streaming part: https://github.com/channelcat/sanic/issues/1067 Also https://github.com/channelcat/sanic/blob/master/docs/sanic/streaming.md#response-streaming	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	323681589
https://github.com/simonw/datasette/issues/266#issuecomment-389592566	https://api.github.com/repos/simonw/datasette/issues/266	389592566	MDEyOklzc3VlQ29tbWVudDM4OTU5MjU2Ng==	9599	2018-05-16T17:01:29Z	2018-05-16T17:02:21Z	OWNER	Let's provide a CSV Dialect definition too: https://frictionlessdata.io/specs/csv-dialect/ - via https://twitter.com/drewdaraabrams/status/996794915680997382	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	323681589
https://github.com/simonw/datasette/issues/266#issuecomment-389579762	https://api.github.com/repos/simonw/datasette/issues/266	389579762	MDEyOklzc3VlQ29tbWVudDM4OTU3OTc2Mg==	9599	2018-05-16T16:21:12Z	2018-05-16T16:21:12Z	OWNER	> I basically want someone to tell me which arguments I can pass to Python's csv.writer() function that will result in the least complaints from people who try to parse the results :) https://twitter.com/simonw/status/996786815938977792	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	323681589
https://github.com/simonw/datasette/issues/266#issuecomment-389579363	https://api.github.com/repos/simonw/datasette/issues/266	389579363	MDEyOklzc3VlQ29tbWVudDM4OTU3OTM2Mw==	9599	2018-05-16T16:20:06Z	2018-05-16T16:20:06Z	OWNER	I started a thread on Twitter discussing various CSV output dialects: https://twitter.com/simonw/status/996783395504979968 - I want to pick defaults which will work as well as possible for whatever tools people might be using to consume the data.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	323681589
https://github.com/simonw/datasette/issues/266#issuecomment-389572201	https://api.github.com/repos/simonw/datasette/issues/266	389572201	MDEyOklzc3VlQ29tbWVudDM4OTU3MjIwMQ==	9599	2018-05-16T15:58:43Z	2018-05-16T16:00:47Z	OWNER	This will likely be implemented in the `BaseView` class, which needs to know how to spot the `.csv` extension, call the underlying JSON generating function and then return the `columns` and `rows` as correctly formatted CSV. https://github.com/simonw/datasette/blob/9959a9e4deec8e3e178f919e8b494214d5faa7fd/datasette/views/base.py#L201-L207 This means it will take ALL arguments that are available to the `.json` view. It may ignore some (e.g. `_facet=` makes no sense since CSV tables don't have space to show the facet results). In streaming mode, things will behave a little bit differently - in particular, if `_stream=1` then `_next=` will be forbidden. It can't include a length header because we don't know how many bytes it will be CSV output will throw an error if the endpoint doesn't have rows and columns keys eg `/-/inspect.json` So the implementation... - looks for the `.csv` extension - internally fetches the `.json` data instead - If no `_stream` it just transposes that JSON to CSV with the correct content type header - If `_stream=1` - checks for `_next=` and throws an error if it was provided - Otherwise... fetch first page and emit CSV header and first set of rows - Then start async looping, emitting more CSV rows and following the `_next=` internal reference until done I like that this takes advantage of efficient pagination. It may not work so well for views which use offset/limit though. It won't work at all for custom SQL because custom SQL doesn't support _next= pagination. That's fine. For views... easiest fix is to cut off after first X000 records. That seems OK. View JSON would need to include a property that the mechanism can identify.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	323681589