home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where author_association = "OWNER", issue = 749283032 and "updated_at" is on date 2022-04-21 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • simonw 3

issue 1

  • register_output_renderer() should support streaming data · 3 ✖

author_association 1

  • OWNER · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1105615625 https://github.com/simonw/datasette/issues/1101#issuecomment-1105615625 https://api.github.com/repos/simonw/datasette/issues/1101 IC_kwDOBm6k_c5B5lsJ simonw 9599 2022-04-21T18:31:41Z 2022-04-21T18:32:22Z OWNER

The datasette-geojson plugin is actually an interesting case here, because of the way it converts SpatiaLite geometries into GeoJSON: https://github.com/eyeseast/datasette-geojson/blob/602c4477dc7ddadb1c0a156cbcd2ef6688a5921d/datasette_geojson/init.py#L61-L66

```python

if isinstance(geometry, bytes):
    results = await db.execute(
        "SELECT AsGeoJSON(:geometry)", {"geometry": geometry}
    )
    return geojson.loads(results.single_value())

`` That actually seems to work really well as-is, but it does worry me a bit that it ends up having to execute an extraSELECT` query for every single returned row - especially in streaming mode where it might be asked to return 1m rows at once.

My PostgreSQL/MySQL engineering brain says that this would be better handled by doing a chunk of these (maybe 100) at once, to avoid the per-query-overhead - but with SQLite that might not be necessary.

At any rate, this is one of the reasons I'm interested in "iterate over this sequence of chunks of 100 rows at a time" as a potential option here.

Of course, a better solution would be for datasette-geojson to have a way to influence the SQL query before it is executed, adding a AsGeoJSON(geometry) clause to it - so that's something I'm open to as well.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
register_output_renderer() should support streaming data 749283032  
1105608964 https://github.com/simonw/datasette/issues/1101#issuecomment-1105608964 https://api.github.com/repos/simonw/datasette/issues/1101 IC_kwDOBm6k_c5B5kEE simonw 9599 2022-04-21T18:26:29Z 2022-04-21T18:26:29Z OWNER

I'm questioning if the mechanisms should be separate at all now - a single response rendering is really just a case of a streaming response that only pulls the first N records from the iterator.

It probably needs to be an async for iterator, which I've not worked with much before. Good opportunity to learn.

This actually gets a fair bit more complicated due to the work I'm doing right now to improve the default JSON API:

  • 1709

I want to do things like make faceting results optionally available to custom renderers - which is a separate concern from streaming rows.

I'm going to poke around with a bunch of prototypes and see what sticks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
register_output_renderer() should support streaming data 749283032  
1105571003 https://github.com/simonw/datasette/issues/1101#issuecomment-1105571003 https://api.github.com/repos/simonw/datasette/issues/1101 IC_kwDOBm6k_c5B5ay7 simonw 9599 2022-04-21T18:10:38Z 2022-04-21T18:10:46Z OWNER

Maybe the simplest design for this is to add an optional can_stream to the contract:

python @hookimpl def register_output_renderer(datasette): return { "extension": "tsv", "render": render_tsv, "can_render": lambda: True, "can_stream": lambda: True } When streaming, a new parameter could be passed to the render function - maybe chunks - which is an iterator/generator over a sequence of chunks of rows.

Or it could use the existing rows parameter but treat that as an iterator?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
register_output_renderer() should support streaming data 749283032  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.273ms · About: github-to-sqlite