home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

16 rows where "created_at" is on date 2020-05-01 and user = 9599 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, body, created_at (date), updated_at (date)

issue 8

  • sqlite3.OperationalError: too many SQL variables in insert_all when using rows with varying numbers of columns 7
  • Create index on issue_comments(user) and other foreign keys 2
  • Mechanism for creating views if they don't yet exist 2
  • Add this repos_starred view 1
  • Add this view for seeing new releases 1
  • Cloud Run fails to serve database files larger than 32MB 1
  • Add view for better display of dependent repos 1
  • "sqlite-utils views" command 1

author_association 2

  • OWNER 9
  • MEMBER 7

user 1

  • simonw · 16 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
622587177 https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622587177 https://api.github.com/repos/simonw/sqlite-utils/issues/103 MDEyOklzc3VlQ29tbWVudDYyMjU4NzE3Nw== simonw 9599 2020-05-01T22:07:51Z 2020-05-01T22:07:51Z OWNER

This is my failed attempt to recreate the bug (plus some extra debugging output): ```diff % git diff diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py index dd49d5c..ea42aea 100644 --- a/sqlite_utils/db.py +++ b/sqlite_utils/db.py @@ -1013,7 +1013,11 @@ class Table(Queryable): assert ( num_columns <= SQLITE_MAX_VARS ), "Rows can have a maximum of {} columns".format(SQLITE_MAX_VARS) + print("default batch_size = ", batch_size) batch_size = max(1, min(batch_size, SQLITE_MAX_VARS // num_columns)) + print("new batch_size = {},num_columns = {}, MAX_VARS // num_columns = {}".format( + batch_size, num_columns, SQLITE_MAX_VARS // num_columns + )) self.last_rowid = None self.last_pk = None for chunk in chunks(itertools.chain([first_record], records), batch_size): @@ -1124,6 +1128,9 @@ class Table(Queryable): ) flat_values = list(itertools.chain(*values)) queries_and_params = [(sql, flat_values)] + print(sql.count("?"), len(flat_values)) + + # print(json.dumps(queries_and_params, indent=4))

         with self.db.conn:
             for query, params in queries_and_params:

diff --git a/tests/test_create.py b/tests/test_create.py index 5290cd8..52940df 100644 --- a/tests/test_create.py +++ b/tests/test_create.py @@ -853,3 +853,33 @@ def test_create_with_nested_bytes(fresh_db): record = {"id": 1, "data": {"foo": b"bytes"}} fresh_db["t"].insert(record) assert [{"id": 1, "data": '{"foo": "b\'bytes\'"}'}] == list(fresh_db["t"].rows) + + +def test_create_throws_useful_error_with_increasing_number_of_columns(fresh_db): + # https://github.com/simonw/sqlite-utils/issues/103 + def rows(): + yield {"name": 0} + for i in range(1, 1001): + yield { + "name": i, + "age": i, + "size": i, + "name2": i, + "age2": i, + "size2": i, + "name3": i, + "age3": i, + "size3": i, + "name4": i, + "age4": i, + "size4": i, + "name5": i, + "age5": i, + "size5": i, + "name6": i, + "age6": i, + "size6": i, + } + + fresh_db["t"].insert_all(rows()) + assert 1001 == fresh_db["t"].count ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite3.OperationalError: too many SQL variables in insert_all when using rows with varying numbers of columns 610517472  
622584433 https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622584433 https://api.github.com/repos/simonw/sqlite-utils/issues/103 MDEyOklzc3VlQ29tbWVudDYyMjU4NDQzMw== simonw 9599 2020-05-01T21:57:52Z 2020-05-01T21:57:52Z OWNER

@b0b5h4rp13 I'm having trouble creating a test that triggers this bug. Could you share a chunk of code that replicates what you're seeing here?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite3.OperationalError: too many SQL variables in insert_all when using rows with varying numbers of columns 610517472  
622565276 https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622565276 https://api.github.com/repos/simonw/sqlite-utils/issues/103 MDEyOklzc3VlQ29tbWVudDYyMjU2NTI3Ng== simonw 9599 2020-05-01T20:57:16Z 2020-05-01T20:57:16Z OWNER

I'm reconsidering this: I think this is going to happen ANY time someone has at least one row that is wider than the first row. So at the very least I should show a more understandable error message.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite3.OperationalError: too many SQL variables in insert_all when using rows with varying numbers of columns 610517472  
622563188 https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622563188 https://api.github.com/repos/simonw/sqlite-utils/issues/103 MDEyOklzc3VlQ29tbWVudDYyMjU2MzE4OA== simonw 9599 2020-05-01T20:51:24Z 2020-05-01T20:51:29Z OWNER

Hopefully anyone who runs into this problem in the future will search for and find this issue thread!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite3.OperationalError: too many SQL variables in insert_all when using rows with varying numbers of columns 610517472  
622563059 https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622563059 https://api.github.com/repos/simonw/sqlite-utils/issues/103 MDEyOklzc3VlQ29tbWVudDYyMjU2MzA1OQ== simonw 9599 2020-05-01T20:51:01Z 2020-05-01T20:51:01Z OWNER

I'm not sure what to do about this.

I was thinking the solution would be to look at ALL of the rows in a batch before deciding on the maximum number of columns, but that doesn't work because we calculate batch size based on the number of columns!

I think my recommendation here is to manually pass a batch_size= argument to .insert_all() if you run into this error.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite3.OperationalError: too many SQL variables in insert_all when using rows with varying numbers of columns 610517472  
622561944 https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622561944 https://api.github.com/repos/simonw/sqlite-utils/issues/103 MDEyOklzc3VlQ29tbWVudDYyMjU2MTk0NA== simonw 9599 2020-05-01T20:47:51Z 2020-05-01T20:47:51Z OWNER

Yup we only take the number of columns in the first record into account at the moment: https://github.com/simonw/sqlite-utils/blob/d56029549acae0b0ea94c5a0f783e3b3895d9218/sqlite_utils/db.py#L1007-L1016

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite3.OperationalError: too many SQL variables in insert_all when using rows with varying numbers of columns 610517472  
622561585 https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622561585 https://api.github.com/repos/simonw/sqlite-utils/issues/103 MDEyOklzc3VlQ29tbWVudDYyMjU2MTU4NQ== simonw 9599 2020-05-01T20:46:50Z 2020-05-01T20:46:50Z OWNER

The varying number of columns thing is interesting - I don't think the tests cover that case much if at all.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite3.OperationalError: too many SQL variables in insert_all when using rows with varying numbers of columns 610517472  
622558889 https://github.com/simonw/sqlite-utils/issues/105#issuecomment-622558889 https://api.github.com/repos/simonw/sqlite-utils/issues/105 MDEyOklzc3VlQ29tbWVudDYyMjU1ODg4OQ== simonw 9599 2020-05-01T20:40:06Z 2020-05-01T20:40:06Z OWNER

Documentation: https://sqlite-utils.readthedocs.io/en/latest/cli.html#listing-views

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"sqlite-utils views" command 610853576  
622461948 https://github.com/dogsheep/github-to-sqlite/issues/37#issuecomment-622461948 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/37 MDEyOklzc3VlQ29tbWVudDYyMjQ2MTk0OA== simonw 9599 2020-05-01T16:36:42Z 2020-05-01T16:36:42Z MEMBER

It should only create views if the underlying tables exist.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for creating views if they don't yet exist 610843136  
622461537 https://github.com/dogsheep/github-to-sqlite/issues/37#issuecomment-622461537 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/37 MDEyOklzc3VlQ29tbWVudDYyMjQ2MTUzNw== simonw 9599 2020-05-01T16:35:40Z 2020-05-01T16:35:40Z MEMBER

This will check if the view exists and has the exact same matching definition as the one we want. If it doesn't, we will drop it (if it exists) and recreate it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for creating views if they don't yet exist 610843136  
622461223 https://github.com/dogsheep/github-to-sqlite/issues/12#issuecomment-622461223 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/12 MDEyOklzc3VlQ29tbWVudDYyMjQ2MTIyMw== simonw 9599 2020-05-01T16:34:52Z 2020-05-01T16:34:52Z MEMBER

Blocked on #37

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add this view for seeing new releases 520756546  
622461122 https://github.com/dogsheep/github-to-sqlite/issues/10#issuecomment-622461122 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/10 MDEyOklzc3VlQ29tbWVudDYyMjQ2MTEyMg== simonw 9599 2020-05-01T16:34:39Z 2020-05-01T16:34:39Z MEMBER

Blocked on #37

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add this repos_starred view 516967682  
622461025 https://github.com/dogsheep/github-to-sqlite/issues/36#issuecomment-622461025 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/36 MDEyOklzc3VlQ29tbWVudDYyMjQ2MTAyNQ== simonw 9599 2020-05-01T16:34:24Z 2020-05-01T16:34:24Z MEMBER

Blocked on #37

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add view for better display of dependent repos 610842926  
622450636 https://github.com/simonw/datasette/issues/749#issuecomment-622450636 https://api.github.com/repos/simonw/datasette/issues/749 MDEyOklzc3VlQ29tbWVudDYyMjQ1MDYzNg== simonw 9599 2020-05-01T16:08:46Z 2020-05-01T16:08:46Z OWNER

Proposed solution: on Cloud Run don't show the "download database" link if the database file is larger than 32MB.

I can do this with a new config setting, max_db_mb, which is automatically set by the publish cloudrun command.

This is consistent with the existing max_csv_mb setting: https://datasette.readthedocs.io/en/stable/config.html#max-csv-mb

I should set max_csv_mb to 32MB on Cloud Run deploys as well.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Cloud Run fails to serve database files larger than 32MB 610829227  
622214262 https://github.com/dogsheep/github-to-sqlite/issues/35#issuecomment-622214262 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/35 MDEyOklzc3VlQ29tbWVudDYyMjIxNDI2Mg== simonw 9599 2020-05-01T02:10:32Z 2020-05-01T02:11:19Z MEMBER

This sped that query up even more - down to 4ms. sql create index issue_comments_issue on issue_comments(issue);

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Create index on issue_comments(user) and other foreign keys 610511450  
622213950 https://github.com/dogsheep/github-to-sqlite/issues/35#issuecomment-622213950 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/35 MDEyOklzc3VlQ29tbWVudDYyMjIxMzk1MA== simonw 9599 2020-05-01T02:09:04Z 2020-05-01T02:09:04Z MEMBER

It sped up this query a lot - 2.5s down to 300ms: sql select repos.full_name, json_object( 'href', 'https://github.com/' || repos.full_name || '/issues/' || issues.number, 'label', '#' || issues.number ) as issue, issues.title, users.login, users.id, issues.state, issues.locked, issues.assignee, issues.milestone, issues.comments, issues.created_at, issues.updated_at, issues.closed_at, issues.author_association, issues.pull_request, issues.repo, issues.type from issues join repos on repos.id = issues.repo join users on issues.user = users.id where issues.state = 'open' and issues.user not in (9599, 27856297) and not exists ( select id from issue_comments where issue_comments.user = 9599 and issues.id = issue_comments.issue ) order by issues.updated_at desc;

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Create index on issue_comments(user) and other foreign keys 610511450  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 612.386ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows