home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 459882902, "updated_at" is on date 2022-09-27 and user = 9599 sorted by updated_at descending

✖
✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • simonw · 6 ✖

issue 1

  • Stream all results for arbitrary SQL and canned queries · 6 ✖

author_association 1

  • OWNER 6
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1259693536 https://github.com/simonw/datasette/issues/526#issuecomment-1259693536 https://api.github.com/repos/simonw/datasette/issues/526 IC_kwDOBm6k_c5LFWXg simonw 9599 2022-09-27T15:42:55Z 2022-09-27T15:42:55Z OWNER

It's interesting to note WHY the time limit works against this so well.

The time limit as-implemented looks like this:

https://github.com/simonw/datasette/blob/5f9f567acbc58c9fcd88af440e68034510fb5d2b/datasette/utils/init.py#L181-L201

The key here is conn.set_progress_handler(handler, n) - which specifies that the handler function should be called every n SQLite operations.

The handler function then checks to see if too much time has transpired and conditionally cancels the query.

This also doubles up as a "maximum number of operations" guard, which is what's happening when you attempt to fetch an infinite number of rows from an infinite table.

That limit code could even be extended to say "exit the query after either 5s or 50,000,000 operations".

I don't think that's necessary though.

To be honest I'm having trouble with the idea of dropping max_returned_rows mainly because what Datasette does (allow arbitrary untrusted SQL queries) is dangerous, so I've designed in multiple redundant defence-in-depth mechanisms right from the start.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
1258906440 https://github.com/simonw/datasette/issues/526#issuecomment-1258906440 https://api.github.com/repos/simonw/datasette/issues/526 IC_kwDOBm6k_c5LCWNI simonw 9599 2022-09-27T03:04:37Z 2022-09-27T03:04:37Z OWNER

It would be really neat if we could explore this idea in a plugin, but I don't think Datasette has plugin hooks in the right place for that at the moment.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
1258905781 https://github.com/simonw/datasette/issues/526#issuecomment-1258905781 https://api.github.com/repos/simonw/datasette/issues/526 IC_kwDOBm6k_c5LCWC1 simonw 9599 2022-09-27T03:03:35Z 2022-09-27T03:03:47Z OWNER

Yes good point, the time limit does already protect against that. I've been contemplating a permissioned-users-only relaxation of that time limit too, and I got that idea mixed up with this one in my head.

On that basis maybe this feature would be safe after all? Would need to do some testing, but it may be that the existing time limit provides enough protection here already.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
1258864140 https://github.com/simonw/datasette/issues/526#issuecomment-1258864140 https://api.github.com/repos/simonw/datasette/issues/526 IC_kwDOBm6k_c5LCL4M simonw 9599 2022-09-27T01:55:32Z 2022-09-27T01:55:32Z OWNER

That recursive query is a great example of the kind of thing having a maximum row limit protects against.

Imagine if Datasette CSVs did allow unlimited retrievals. Someone could hit the CSV endpoint for that recursive query and tie up Datasette's SQL connection effectively forever.

Even if this feature becomes a permission-guarded thing we still need to take that case into account.

At the very least it would be good if the query could be cancelled if the client disconnects - so if someone accidentally starts an infinite query they can cancel the request and free up the server resources.

It might be a good idea to implement a page that shows "currently running" queries and allows users with the right permission to terminate them from that page.

Another option: a "limit of last resource" - either a very high row limit (10,000,000 perhaps) or even a time limit, saying that all queries will be cancelled if they take longer than thirty minutes or similar.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
1258860845 https://github.com/simonw/datasette/issues/526#issuecomment-1258860845 https://api.github.com/repos/simonw/datasette/issues/526 IC_kwDOBm6k_c5LCLEt simonw 9599 2022-09-27T01:48:31Z 2022-09-27T01:50:01Z OWNER

The protection is supposed to be from this line: python rows = cursor.fetchmany(max_returned_rows + 1) By capping the call to .fetchman() at max_returned_rows + 1 (the + 1 is to allow detection of whether or not there is a next page) I'm ensuring that Datasette never attempts to iterate over a huge result set.

SQLite and the sqlite3 library seem to handle this correctly. Here's an example:

```pycon

import sqlite3 conn = sqlite3.connect(":memory:") cursor = conn.execute(""" ... with recursive counter(x) as ( ... select 0 ... union ... select x + 1 from counter ... ) ... select * from counter""") cursor.fetchmany(10) [(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,), (10,)] ``counter` there is an infinitely long table (see TIL) - but we can retrieve the first 10 results without going into an infinite loop.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  
1258846992 https://github.com/simonw/datasette/issues/526#issuecomment-1258846992 https://api.github.com/repos/simonw/datasette/issues/526 IC_kwDOBm6k_c5LCHsQ simonw 9599 2022-09-27T01:21:41Z 2022-09-27T01:21:41Z OWNER

My main concern here is that public Datasette instances could easily have all of their available database connections consumed by long-running queries - either accidentally or deliberately.

I do totally understand the need for this feature though. I think it can absolutely make sense provided it's protected by authentication and permissions.

Maybe even limit the number of concurrent downloads at once such that there's always at least one database connection free for other requests.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stream all results for arbitrary SQL and canned queries 459882902  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 524.612ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows