home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 459882902 and "updated_at" is on date 2019-06-24 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • simonw 3

issue 1

  • Stream all results for arbitrary SQL and canned queries · 3 ✖

author_association 1

  • OWNER 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
505162238 https://github.com/simonw/datasette/issues/526#issuecomment-505162238 https://api.github.com/repos/simonw/datasette/issues/526 MDEyOklzc3VlQ29tbWVudDUwNTE2MjIzOA== simonw 9599 2019-06-24T20:14:51Z 2019-06-24T20:14:51Z OWNER

The other reason I didn't implement this in the first place is that adding offset/limit to a custom query (as opposed to a view) requires modifying the existing SQL - what if that SQL already has its own offset/limit clause?

It looks like I can solve that using a nested query: sql select * from ( select * from compound_three_primary_keys limit 1000 ) limit 10 offset 100 https://latest.datasette.io/fixtures?sql=select++from+%28%0D%0A++select++from+compound_three_primary_keys+limit+1000%0D%0A%29+limit+10+offset+100

So I can wrap any user-provided SQL query in an outer offset/limit and implement pagination that way.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
505161008 https://github.com/simonw/datasette/issues/526#issuecomment-505161008 https://api.github.com/repos/simonw/datasette/issues/526 MDEyOklzc3VlQ29tbWVudDUwNTE2MTAwOA== simonw 9599 2019-06-24T20:11:15Z 2019-06-24T20:11:15Z OWNER

Views already use offset/limit pagination so actually I may be over-thinking this.

Maybe the right thing to do here is to have the feature enabled by default, since it will work for the VAST majority of queries - the only ones that might cause problems are complex queries across millions of rows. It can continue to use aggressive internal time limits so if someone DOES trigger something expensive they'll get an error.

I can allow users to disable the feature with a config setting, or increase the time limit if they need to.

Downgrading this from a medium to a small since it's much less effort to enable the existing pagination method for this type of query.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
505060332 https://github.com/simonw/datasette/issues/526#issuecomment-505060332 https://api.github.com/repos/simonw/datasette/issues/526 MDEyOklzc3VlQ29tbWVudDUwNTA2MDMzMg== simonw 9599 2019-06-24T15:28:16Z 2019-06-24T15:28:16Z OWNER

This is currently a deliberate feature decision.

The problem is that the streaming CSV feature relies on Datasette's automated efficient pagination under the hood. When you stream a CSV you're actually causing Datasette to paginate through the full set of "pages" under the hood, streaming each page out as a new chunk of CSV rows.

This mechanism only works if the next_url has been generated for the page. Currently the next_url is available for table views (where it uses the primary key or the sort column) and for views, but it's not set for canned queries because I can't be certain they can be efficiently paginated.

Offset/limit pagination for canned queries would be a pretty nasty performance hit, because each subsequent page would require even more time for SQLite to scroll through to the specified offset.

This does seem like it's worth fixing though: pulling every row for a canned queries would definitely be useful. The problem is that the pagination trick used elsewhere isn't right for canned queries - instead I would need to keep the database cursor open until ALL rows had been fetched. Figuring out how to do that efficiently within an asyncio managed thread pool may take some thought.

Maybe this feature ends up as something which is turned off by default (due to the risk of it causing uptime problems for public sites) but that users working on their own private environments can turn on?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 25.914ms · About: github-to-sqlite