github
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/simonw/datasette/issues/266#issuecomment-389570841 | https://api.github.com/repos/simonw/datasette/issues/266 | 389570841 | MDEyOklzc3VlQ29tbWVudDM4OTU3MDg0MQ== | 9599 | 2018-05-16T15:54:49Z | 2018-06-15T07:41:09Z | OWNER | At the most basic level, this will work based on an extension. Most places you currently put a `.json` extension should also allow a `.csv` extension. By default this will return the exact results you see on the current page (default max will remain 1000). ## Streaming all records Where things get interested is *streaming mode*. This will be an option which returns ALL matching records as a streaming CSV file, even if that ends up being millions of records. I think the best way to build this will be on top of the existing mechanism used to efficiently implement keyset pagination via `_next=` tokens. ## Expanding foreign keys For tables with foreign key references it would be useful if the CSV format could expand those references to include the labels from `label_column` - maybe via an additional `?_expand=1` option. When expanding each foreign key column will be shown twice: rowid,city_id,city_id_label,state | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
323681589 | |
https://github.com/simonw/datasette/issues/266#issuecomment-389626715 | https://api.github.com/repos/simonw/datasette/issues/266 | 389626715 | MDEyOklzc3VlQ29tbWVudDM4OTYyNjcxNQ== | 9599 | 2018-05-16T18:50:46Z | 2018-05-16T18:50:46Z | OWNER | > I’d recommend using the Windows-1252 encoding for maximum compatibility, unless you have any characters not in that set, in which case use UTF8 with a byte order mark. Bit of a pain, but some progams (eg various versions of Excel) don’t read UTF8. **frankieroberto** https://twitter.com/frankieroberto/status/996823071947460616 > There is software that consumes CSV and doesn't speak UTF8!? Huh. Well I can't just use Windows-1252 because I need to support the full UTF8 range of potential data - maybe I should support an optional ?_encoding=windows-1252 argument **simonw** https://twitter.com/simonw/status/996824677245857793 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
323681589 | |
https://github.com/simonw/datasette/issues/266#issuecomment-389608473 | https://api.github.com/repos/simonw/datasette/issues/266 | 389608473 | MDEyOklzc3VlQ29tbWVudDM4OTYwODQ3Mw== | 9599 | 2018-05-16T17:52:35Z | 2018-05-16T17:54:11Z | OWNER | There are some code examples in this issue which should help with the streaming part: https://github.com/channelcat/sanic/issues/1067 Also https://github.com/channelcat/sanic/blob/master/docs/sanic/streaming.md#response-streaming | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
323681589 | |
https://github.com/simonw/datasette/issues/266#issuecomment-389592566 | https://api.github.com/repos/simonw/datasette/issues/266 | 389592566 | MDEyOklzc3VlQ29tbWVudDM4OTU5MjU2Ng== | 9599 | 2018-05-16T17:01:29Z | 2018-05-16T17:02:21Z | OWNER | Let's provide a CSV Dialect definition too: https://frictionlessdata.io/specs/csv-dialect/ - via https://twitter.com/drewdaraabrams/status/996794915680997382 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
323681589 | |
https://github.com/simonw/datasette/issues/266#issuecomment-389579762 | https://api.github.com/repos/simonw/datasette/issues/266 | 389579762 | MDEyOklzc3VlQ29tbWVudDM4OTU3OTc2Mg== | 9599 | 2018-05-16T16:21:12Z | 2018-05-16T16:21:12Z | OWNER | > I basically want someone to tell me which arguments I can pass to Python's csv.writer() function that will result in the least complaints from people who try to parse the results :) https://twitter.com/simonw/status/996786815938977792 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
323681589 | |
https://github.com/simonw/datasette/issues/266#issuecomment-389579363 | https://api.github.com/repos/simonw/datasette/issues/266 | 389579363 | MDEyOklzc3VlQ29tbWVudDM4OTU3OTM2Mw== | 9599 | 2018-05-16T16:20:06Z | 2018-05-16T16:20:06Z | OWNER | I started a thread on Twitter discussing various CSV output dialects: https://twitter.com/simonw/status/996783395504979968 - I want to pick defaults which will work as well as possible for whatever tools people might be using to consume the data. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
323681589 | |
https://github.com/simonw/datasette/issues/266#issuecomment-389572201 | https://api.github.com/repos/simonw/datasette/issues/266 | 389572201 | MDEyOklzc3VlQ29tbWVudDM4OTU3MjIwMQ== | 9599 | 2018-05-16T15:58:43Z | 2018-05-16T16:00:47Z | OWNER | This will likely be implemented in the `BaseView` class, which needs to know how to spot the `.csv` extension, call the underlying JSON generating function and then return the `columns` and `rows` as correctly formatted CSV. https://github.com/simonw/datasette/blob/9959a9e4deec8e3e178f919e8b494214d5faa7fd/datasette/views/base.py#L201-L207 This means it will take ALL arguments that are available to the `.json` view. It may ignore some (e.g. `_facet=` makes no sense since CSV tables don't have space to show the facet results). In streaming mode, things will behave a little bit differently - in particular, if `_stream=1` then `_next=` will be forbidden. It can't include a length header because we don't know how many bytes it will be CSV output will throw an error if the endpoint doesn't have rows and columns keys eg `/-/inspect.json` So the implementation... - looks for the `.csv` extension - internally fetches the `.json` data instead - If no `_stream` it just transposes that JSON to CSV with the correct content type header - If `_stream=1` - checks for `_next=` and throws an error if it was provided - Otherwise... fetch first page and emit CSV header and first set of rows - Then start async looping, emitting more CSV rows and following the `_next=` internal reference until done I like that this takes advantage of efficient pagination. It may not work so well for views which use offset/limit though. It won't work at all for custom SQL because custom SQL doesn't support _next= pagination. That's fine. For views... easiest fix is to cut off after first X000 records. That seems OK. View JSON would need to include a property that the mechanism can identify. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
323681589 |