html_url,issue_url,id,node_id,user,user_label,created_at,updated_at,author_association,body,reactions,issue,issue_label,performed_via_github_app
https://github.com/simonw/datasette/issues/1101#issuecomment-1399341761,https://api.github.com/repos/simonw/datasette/issues/1101,1399341761,IC_kwDOBm6k_c5TaELB,9599,simonw,2023-01-21T22:07:19Z,2023-01-21T22:07:19Z,OWNER,"Idea for supporting streaming with the `register_output_renderer` hook:

```python
@hookimpl
def register_output_renderer(datasette):
    return {
        ""extension"": ""test"",
        ""render"": render_demo,
        ""can_render"": can_render_demo,
        ""render_stream"": render_demo_stream, # This is new
    }
```
So there's a new `""render_stream""` key which can be returned, which if present means that the output renderer supports streaming.

I'll play around with the design of that function signature in:

- #1999
- #1062 ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",749283032,register_output_renderer() should support streaming data,
https://github.com/simonw/datasette/issues/1101#issuecomment-1105642187,https://api.github.com/repos/simonw/datasette/issues/1101,1105642187,IC_kwDOBm6k_c5B5sLL,25778,eyeseast,2022-04-21T18:59:08Z,2022-04-21T18:59:08Z,CONTRIBUTOR,"Ha! That was your idea (and a good one).

But it's probably worth measuring to see what overhead it adds. It did require both passing in the database and making the whole thing `async`. 

Just timing the queries themselves:

1. [Using `AsGeoJSON(geometry) as geometry`](https://alltheplaces-datasette.fly.dev/alltheplaces?sql=select%0D%0A++id%2C%0D%0A++properties%2C%0D%0A++AsGeoJSON%28geometry%29+as+geometry%2C%0D%0A++spider%0D%0Afrom%0D%0A++places%0D%0Aorder+by%0D%0A++id%0D%0Alimit%0D%0A++1000) takes 10.235 ms
2. [Leaving as binary](https://alltheplaces-datasette.fly.dev/alltheplaces?sql=select%0D%0A++id%2C%0D%0A++properties%2C%0D%0A++geometry%2C%0D%0A++spider%0D%0Afrom%0D%0A++places%0D%0Aorder+by%0D%0A++id%0D%0Alimit%0D%0A++1000) takes 8.63 ms

Looking at the network panel:

1. Takes about 200 ms for the `fetch` request
2. Takes about 300 ms

I'm not sure how best to time the GeoJSON generation, but it would be interesting to check. Maybe I'll write a plugin to add query times to response headers.

The other thing to consider with async streaming is that it might be well-suited for a slower response. When I have to get the whole result and send a response in a fixed amount of time, I need the most efficient query possible. If I can hang onto a connection and get things one chunk at a time, maybe it's ok if there's some overhead.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",749283032,register_output_renderer() should support streaming data,
https://github.com/simonw/datasette/issues/1101#issuecomment-1105615625,https://api.github.com/repos/simonw/datasette/issues/1101,1105615625,IC_kwDOBm6k_c5B5lsJ,9599,simonw,2022-04-21T18:31:41Z,2022-04-21T18:32:22Z,OWNER,"The `datasette-geojson` plugin is actually an interesting case here, because of the way it converts SpatiaLite geometries into GeoJSON: https://github.com/eyeseast/datasette-geojson/blob/602c4477dc7ddadb1c0a156cbcd2ef6688a5921d/datasette_geojson/__init__.py#L61-L66

```python

    if isinstance(geometry, bytes):
        results = await db.execute(
            ""SELECT AsGeoJSON(:geometry)"", {""geometry"": geometry}
        )
        return geojson.loads(results.single_value())
```
That actually seems to work really well as-is, but it does worry me a bit that it ends up having to execute an extra `SELECT` query for every single returned row - especially in streaming mode where it might be asked to return 1m rows at once.

My PostgreSQL/MySQL engineering brain says that this would be better handled by doing a chunk of these (maybe 100) at once, to avoid the per-query-overhead - but with SQLite that might not be necessary.

At any rate, this is one of the reasons I'm interested in ""iterate over this sequence of chunks of 100 rows at a time"" as a potential option here.

Of course, a better solution would be for `datasette-geojson` to have a way to influence the SQL query before it is executed, adding a `AsGeoJSON(geometry)` clause to it - so that's something I'm open to as well.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",749283032,register_output_renderer() should support streaming data,
https://github.com/simonw/datasette/issues/1101#issuecomment-1105608964,https://api.github.com/repos/simonw/datasette/issues/1101,1105608964,IC_kwDOBm6k_c5B5kEE,9599,simonw,2022-04-21T18:26:29Z,2022-04-21T18:26:29Z,OWNER,"I'm questioning if the mechanisms should be separate at all now - a single response rendering is really just a case of a streaming response that only pulls the first N records from the iterator.

It probably needs to be an `async for` iterator, which I've not worked with much before. Good opportunity to learn.

This actually gets a fair bit more complicated due to the work I'm doing right now to improve the default JSON API:

- #1709

I want to do things like make faceting results optionally available to custom renderers - which is a separate concern from streaming rows.

I'm going to poke around with a bunch of prototypes and see what sticks.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",749283032,register_output_renderer() should support streaming data,
https://github.com/simonw/datasette/issues/1101#issuecomment-1105588651,https://api.github.com/repos/simonw/datasette/issues/1101,1105588651,IC_kwDOBm6k_c5B5fGr,25778,eyeseast,2022-04-21T18:15:39Z,2022-04-21T18:15:39Z,CONTRIBUTOR,"What if you split rendering and streaming into two things:

- `render` is a function that returns a response
- `stream` is a function that sends chunks, or yields chunks passed to an ASGI `send` callback

That way current plugins still work, and streaming is purely additive. A `stream` function could get a cursor or iterator of rows, instead of a list, so it could more efficiently handle large queries.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",749283032,register_output_renderer() should support streaming data,
https://github.com/simonw/datasette/issues/1101#issuecomment-1105571003,https://api.github.com/repos/simonw/datasette/issues/1101,1105571003,IC_kwDOBm6k_c5B5ay7,9599,simonw,2022-04-21T18:10:38Z,2022-04-21T18:10:46Z,OWNER,"Maybe the simplest design for this is to add an optional `can_stream` to the contract:

```python
    @hookimpl
    def register_output_renderer(datasette):
        return {
            ""extension"": ""tsv"",
            ""render"": render_tsv,
            ""can_render"": lambda: True,
            ""can_stream"": lambda: True
        }
```
When streaming, a new parameter could be passed to the render function - maybe `chunks` - which is an iterator/generator over a sequence of chunks of rows.

Or it could use the existing `rows` parameter but treat that as an iterator?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",749283032,register_output_renderer() should support streaming data,
https://github.com/simonw/datasette/issues/1101#issuecomment-869812567,https://api.github.com/repos/simonw/datasette/issues/1101,869812567,MDEyOklzc3VlQ29tbWVudDg2OTgxMjU2Nw==,9599,simonw,2021-06-28T16:06:57Z,2021-06-28T16:07:24Z,OWNER,"Relevant blog post: https://simonwillison.net/2021/Jun/25/streaming-large-api-responses/ - including notes on efficiently streaming formats with some kind of separator in between the records (regular JSON).

> Some export formats are friendlier for streaming than others. CSV and TSV are pretty easy to stream, as is newline-delimited JSON.
> 
> Regular JSON requires a bit more thought: you can output a `[` character, then output each row in a stream with a comma suffix, then skip the comma for the last row and output a `]`. Doing that requires peeking ahead (looping two at a time) to verify that you haven't yet reached the end.
> 
> Or... Martin De Wulf [pointed out](https://twitter.com/madewulf/status/1405559088994467844) that you can output the first row, then output every other row with a preceeding comma---which avoids the whole ""iterate two at a time"" problem entirely.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",749283032,register_output_renderer() should support streaming data,
https://github.com/simonw/datasette/issues/1101#issuecomment-869191854,https://api.github.com/repos/simonw/datasette/issues/1101,869191854,MDEyOklzc3VlQ29tbWVudDg2OTE5MTg1NA==,25778,eyeseast,2021-06-27T16:42:14Z,2021-06-27T16:42:14Z,CONTRIBUTOR,This would really help with this issue: https://github.com/eyeseast/datasette-geojson/issues/7,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",749283032,register_output_renderer() should support streaming data,
https://github.com/simonw/datasette/issues/1101#issuecomment-755134771,https://api.github.com/repos/simonw/datasette/issues/1101,755134771,MDEyOklzc3VlQ29tbWVudDc1NTEzNDc3MQ==,9599,simonw,2021-01-06T07:28:01Z,2021-01-06T07:28:01Z,OWNER,"With this structure it will become possible to stream non-newline-delimited JSON array-of-objects too - the `stream_rows()` method could output `[` first, then each row followed by a comma, then `]` after the very last row.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",749283032,register_output_renderer() should support streaming data,
https://github.com/simonw/datasette/issues/1101#issuecomment-755133937,https://api.github.com/repos/simonw/datasette/issues/1101,755133937,MDEyOklzc3VlQ29tbWVudDc1NTEzMzkzNw==,9599,simonw,2021-01-06T07:25:48Z,2021-01-06T07:26:43Z,OWNER,"Idea: instead of returning a dictionary, `register_output_renderer` could return an object. The object could have the following properties:

- `.extension` - the extension to use
- `.can_render(...)` - says if it can render this
- `.can_stream(...)` - says if streaming is supported
- `async .stream_rows(rows_iterator, send)` - method that loops through all rows and uses `send` to send them to the response in the correct format

I can then deprecate the existing `dict` return type for 1.0.","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",749283032,register_output_renderer() should support streaming data,
https://github.com/simonw/datasette/issues/1101#issuecomment-755128038,https://api.github.com/repos/simonw/datasette/issues/1101,755128038,MDEyOklzc3VlQ29tbWVudDc1NTEyODAzOA==,9599,simonw,2021-01-06T07:10:22Z,2021-01-06T07:10:22Z,OWNER,"Yet another use-case for this: I want to be able to stream newline-delimited JSON in order to better import into Pandas:

    pandas.read_json(""https://latest.datasette.io/fixtures/compound_three_primary_keys.json?_shape=array&_nl=on"", lines=True)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",749283032,register_output_renderer() should support streaming data,
https://github.com/simonw/datasette/issues/1101#issuecomment-732544590,https://api.github.com/repos/simonw/datasette/issues/1101,732544590,MDEyOklzc3VlQ29tbWVudDczMjU0NDU5MA==,9599,simonw,2020-11-24T02:22:55Z,2020-11-24T02:22:55Z,OWNER,"The trick I'm using here is to follow the `next_url` in order to paginate through all of the matching results. The loop calls the `data()` method multiple times, once for each page of results: https://github.com/simonw/datasette/blob/4bac9f18f9d04e5ed10f072502bcc508e365438e/datasette/views/base.py#L304-L307","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",749283032,register_output_renderer() should support streaming data,
https://github.com/simonw/datasette/issues/1101#issuecomment-732543700,https://api.github.com/repos/simonw/datasette/issues/1101,732543700,MDEyOklzc3VlQ29tbWVudDczMjU0MzcwMA==,9599,simonw,2020-11-24T02:20:30Z,2020-11-24T02:20:30Z,OWNER,"Current design: https://docs.datasette.io/en/stable/plugin_hooks.html#register-output-renderer-datasette

```python
@hookimpl
def register_output_renderer(datasette):
    return {
        ""extension"": ""test"",
        ""render"": render_demo,
        ""can_render"": can_render_demo,  # Optional
    }
```
Where `render_demo` looks something like this:
```python
async def render_demo(datasette, columns, rows):
    db = datasette.get_database()
    result = await db.execute(""select sqlite_version()"")
    first_row = "" | "".join(columns)
    lines = [first_row]
    lines.append(""="" * len(first_row))
    for row in rows:
        lines.append("" | "".join(row))
    return Response(
        ""\n"".join(lines),
        content_type=""text/plain; charset=utf-8"",
        headers={""x-sqlite-version"": result.first()[0]}
    )
```
Meanwhile here's where the CSV streaming mode is implemented: https://github.com/simonw/datasette/blob/4bac9f18f9d04e5ed10f072502bcc508e365438e/datasette/views/base.py#L297-L380","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",749283032,register_output_renderer() should support streaming data,