home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

20 rows where "created_at" is on date 2020-09-07 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 6

  • FTS table with 7 rows has _fts_docsize table with 9,141 rows 10
  • Handle case where subsequent records (after first batch) include extra columns 3
  • Turn on recursive_triggers by default 2
  • table.optimize() should delete junk rows from *_fts_docsize 2
  • OperationalError: cannot change into wal mode from within a transaction 2
  • More attractive indentation of created FTS table schema 1

user 2

  • simonw 18
  • simonwiles 2

author_association 2

  • OWNER 18
  • CONTRIBUTOR 2
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
688544156 https://github.com/simonw/sqlite-utils/issues/154#issuecomment-688544156 https://api.github.com/repos/simonw/sqlite-utils/issues/154 MDEyOklzc3VlQ29tbWVudDY4ODU0NDE1Ng== simonw 9599 2020-09-07T23:47:10Z 2020-09-07T23:47:10Z OWNER

This is already covered in the tests though: https://github.com/simonw/sqlite-utils/blob/deb2eb013ff85bbc828ebc244a9654f0d9c3139e/tests/test_cli.py#L1300-L1328

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
OperationalError: cannot change into wal mode from within a transaction 695441530  
688543128 https://github.com/simonw/sqlite-utils/issues/154#issuecomment-688543128 https://api.github.com/repos/simonw/sqlite-utils/issues/154 MDEyOklzc3VlQ29tbWVudDY4ODU0MzEyOA== simonw 9599 2020-09-07T23:43:10Z 2020-09-07T23:43:10Z OWNER

Running this against the same file works: $ sqlite3 beta.db SQLite version 3.31.1 2020-01-27 19:55:54 Enter ".help" for usage hints. sqlite> PRAGMA journal_mode=wal; wal

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
OperationalError: cannot change into wal mode from within a transaction 695441530  
688500704 https://github.com/simonw/sqlite-utils/issues/152#issuecomment-688500704 https://api.github.com/repos/simonw/sqlite-utils/issues/152 MDEyOklzc3VlQ29tbWVudDY4ODUwMDcwNA== simonw 9599 2020-09-07T20:28:45Z 2020-09-07T21:17:48Z OWNER

The principle reason to turn these on - at least so far - is that without it weird things happen where FTS tables (in particular *_fts_docsize) grow without limit over time, because calls to INSERT OR REPLACE against the parent table cause additional rows to be inserted into *_fts_docsize even if the row was replaced rather than being inserted.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Turn on recursive_triggers by default 695376054  
688511161 https://github.com/simonw/sqlite-utils/issues/153#issuecomment-688511161 https://api.github.com/repos/simonw/sqlite-utils/issues/153 MDEyOklzc3VlQ29tbWVudDY4ODUxMTE2MQ== simonw 9599 2020-09-07T21:07:20Z 2020-09-07T21:07:29Z OWNER

FTS4 uses a different column name here: https://datasette-sqlite-fts4.datasette.io/24ways-fts4/articles_fts_docsize

CREATE TABLE 'articles_fts_docsize'(docid INTEGER PRIMARY KEY, size BLOB);

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.optimize() should delete junk rows from *_fts_docsize 695377804  
688508510 https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688508510 https://api.github.com/repos/simonw/sqlite-utils/issues/146 MDEyOklzc3VlQ29tbWVudDY4ODUwODUxMA== simonw 9599 2020-09-07T20:56:03Z 2020-09-07T20:56:24Z OWNER

The problem with this approach is that it requires us to consume the entire iterator before we can start inserting rows into the table - here on line 1052:

https://github.com/simonw/sqlite-utils/blob/bb131793feac16bc7181ab997568f941b0220ef2/sqlite_utils/db.py#L1047-L1054

I designed the .insert_all() to avoid doing this, because I want to be able to pass it an iterator (or more likely a generator) that could produce potentially millions of records. Doing things one batch of 100 records at a time means that the Python process doesn't need to pull millions of records into memory at once.

db-to-sqlite is one example of a tool that uses that characteristic, in https://github.com/simonw/db-to-sqlite/blob/63e4ee972f292de13bb11767c0fb64b35339d954/db_to_sqlite/cli.py#L94-L106

So we need to solve this issue without consuming the entire iterator with a records = list(records) call.

I think one way to do this is to execute each chunk one at a time and watch out for an exception that indicates that we sent too many parameters - then adjust the chunk size down and try again.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Handle case where subsequent records (after first batch) include extra columns 688668680  
688506015 https://github.com/simonw/sqlite-utils/issues/153#issuecomment-688506015 https://api.github.com/repos/simonw/sqlite-utils/issues/153 MDEyOklzc3VlQ29tbWVudDY4ODUwNjAxNQ== simonw 9599 2020-09-07T20:46:58Z 2020-09-07T20:46:58Z OWNER

Writing a test for this will be a tiny bit tricky. I think I'll use a test that replicates the bug in #149.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.optimize() should delete junk rows from *_fts_docsize 695377804  
688501064 https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688501064 https://api.github.com/repos/simonw/sqlite-utils/issues/149 MDEyOklzc3VlQ29tbWVudDY4ODUwMTA2NA== simonw 9599 2020-09-07T20:30:15Z 2020-09-07T20:30:38Z OWNER

The second challenge here is cleaning up all of those junk rows in existing *_fts_docsize tables. Doing that just to the demo database from https://github-to-sqlite.dogsheep.net/github.db dropped its size from 22MB to 16MB! Here's the SQL: sql DELETE FROM [licenses_fts_docsize] WHERE id NOT IN ( SELECT rowid FROM [licenses_fts]); I can do that as part of the existing table.optimize() method, which optimizes FTS tables.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
FTS table with 7 rows has _fts_docsize table with 9,141 rows 695319258  
688500294 https://github.com/simonw/sqlite-utils/issues/152#issuecomment-688500294 https://api.github.com/repos/simonw/sqlite-utils/issues/152 MDEyOklzc3VlQ29tbWVudDY4ODUwMDI5NA== simonw 9599 2020-09-07T20:27:07Z 2020-09-07T20:27:07Z OWNER

I'm going to make this an argument to the Database() class constructor which defaults to True.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Turn on recursive_triggers by default 695376054  
688499924 https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688499924 https://api.github.com/repos/simonw/sqlite-utils/issues/149 MDEyOklzc3VlQ29tbWVudDY4ODQ5OTkyNA== simonw 9599 2020-09-07T20:25:40Z 2020-09-07T20:25:50Z OWNER

https://www.sqlite.org/pragma.html#pragma_recursive_triggers says:

Prior to SQLite version 3.6.18 (2009-09-11), recursive triggers were not supported. The behavior of SQLite was always as if this pragma was set to OFF. Support for recursive triggers was added in version 3.6.18 but was initially turned OFF by default, for compatibility. Recursive triggers may be turned on by default in future versions of SQLite.

So I think the fix is to turn on recursive_triggers globally by default for sqlite-utils.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
FTS table with 7 rows has _fts_docsize table with 9,141 rows 695319258  
688499650 https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688499650 https://api.github.com/repos/simonw/sqlite-utils/issues/149 MDEyOklzc3VlQ29tbWVudDY4ODQ5OTY1MA== simonw 9599 2020-09-07T20:24:35Z 2020-09-07T20:24:35Z OWNER

This replicates the problem: (github-to-sqlite) /tmp % sqlite-utils tables --counts github.db | grep licenses {"table": "licenses", "count": 7}, {"table": "licenses_fts_data", "count": 35}, {"table": "licenses_fts_idx", "count": 16}, {"table": "licenses_fts_docsize", "count": 9151}, {"table": "licenses_fts_config", "count": 1}, {"table": "licenses_fts", "count": 7}, (github-to-sqlite) /tmp % github-to-sqlite repos github.db dogsheep (github-to-sqlite) /tmp % sqlite-utils tables --counts github.db | grep licenses {"table": "licenses", "count": 7}, {"table": "licenses_fts_data", "count": 45}, {"table": "licenses_fts_idx", "count": 26}, {"table": "licenses_fts_docsize", "count": 9161}, {"table": "licenses_fts_config", "count": 1}, {"table": "licenses_fts", "count": 7}, Note how the number of rows in licenses_fts_docsize goes from 9151 to 9161.

The number went up by ten. I used tracing from #151 to show that the following SQL executed ten times: INSERT OR REPLACE INTO [licenses] ([key], [name], [node_id], [spdx_id], [url]) VALUES (?, ?, ?, ?, ?); Then I tried executing PRAGMA recursive_triggers=on; at the start of the script. This fixed the problem - running the script did not increase the number of rows in licenses_fts_docsize.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
FTS table with 7 rows has _fts_docsize table with 9,141 rows 695319258  
688482355 https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688482355 https://api.github.com/repos/simonw/sqlite-utils/issues/149 MDEyOklzc3VlQ29tbWVudDY4ODQ4MjM1NQ== simonw 9599 2020-09-07T19:22:51Z 2020-09-07T19:22:51Z OWNER

And the SQLite documentation says:

When the REPLACE conflict resolution strategy deletes rows in order to satisfy a constraint, delete triggers fire if and only if recursive triggers are enabled.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
FTS table with 7 rows has _fts_docsize table with 9,141 rows 695319258  
688482055 https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688482055 https://api.github.com/repos/simonw/sqlite-utils/issues/149 MDEyOklzc3VlQ29tbWVudDY4ODQ4MjA1NQ== simonw 9599 2020-09-07T19:21:42Z 2020-09-07T19:21:42Z OWNER

Using replace=True there executes INSERT OR REPLACE - and Dan Kennedy (SQLite maintainer) on the SQLite forums said this:

Are you using "REPLACE INTO", or "UPDATE OR REPLACE" on the "licenses" table without having first executed "PRAGMA recursive_triggers = 1"? The docs note that delete triggers will not be fired in this case, which would explain things. Second paragraph under "REPLACE" here:

https://www.sqlite.org/lang_conflict.html

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
FTS table with 7 rows has _fts_docsize table with 9,141 rows 695319258  
688481374 https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688481374 https://api.github.com/repos/simonw/sqlite-utils/issues/149 MDEyOklzc3VlQ29tbWVudDY4ODQ4MTM3NA== simonw 9599 2020-09-07T19:19:08Z 2020-09-07T19:19:08Z OWNER

reading through the code for github-to-sqlite repos - one of the things it does is calls save_license for each repo:

https://github.com/dogsheep/github-to-sqlite/blob/39b2234253096bd579feed4e25104698b8ccd2ba/github_to_sqlite/utils.py#L259-L262

python def save_license(db, license): if license is None: return None return db["licenses"].insert(license, pk="key", replace=True).last_pk

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
FTS table with 7 rows has _fts_docsize table with 9,141 rows 695319258  
688481317 https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688481317 https://api.github.com/repos/simonw/sqlite-utils/issues/146 MDEyOklzc3VlQ29tbWVudDY4ODQ4MTMxNw== simonwiles 96218 2020-09-07T19:18:55Z 2020-09-07T19:18:55Z CONTRIBUTOR

Just force-pushed to update d042f9c with more formatting changes to satisfy black==20.8b1 and pass the GitHub Actions "Test" workflow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Handle case where subsequent records (after first batch) include extra columns 688668680  
688480665 https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688480665 https://api.github.com/repos/simonw/sqlite-utils/issues/149 MDEyOklzc3VlQ29tbWVudDY4ODQ4MDY2NQ== simonw 9599 2020-09-07T19:16:20Z 2020-09-07T19:16:20Z OWNER

Aha! I have managed to replicate the bug: (github-to-sqlite) /tmp % sqlite-utils tables --counts github.db | grep licenses {"table": "licenses", "count": 7}, {"table": "licenses_fts_data", "count": 35}, {"table": "licenses_fts_idx", "count": 16}, {"table": "licenses_fts_docsize", "count": 9151}, {"table": "licenses_fts_config", "count": 1}, {"table": "licenses_fts", "count": 7}, (github-to-sqlite) /tmp % github-to-sqlite repos github.db dogsheep (github-to-sqlite) /tmp % sqlite-utils tables --counts github.db | grep licenses {"table": "licenses", "count": 7}, {"table": "licenses_fts_data", "count": 45}, {"table": "licenses_fts_idx", "count": 26}, {"table": "licenses_fts_docsize", "count": 9161}, {"table": "licenses_fts_config", "count": 1}, {"table": "licenses_fts", "count": 7}, Note that the number of records in licenses_fts_docsize went from 9151 to 9161.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
FTS table with 7 rows has _fts_docsize table with 9,141 rows 695319258  
688479163 https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688479163 https://api.github.com/repos/simonw/sqlite-utils/issues/146 MDEyOklzc3VlQ29tbWVudDY4ODQ3OTE2Mw== simonwiles 96218 2020-09-07T19:10:33Z 2020-09-07T19:11:57Z CONTRIBUTOR

@simonw -- I've gone ahead updated the documentation to reflect the changes introduced in this PR. IMO it's ready to merge now.

In writing the documentation changes, I begin to wonder about the value and role of batch_size at all, tbh. May I assume it was originally intended to prevent using the entire row set to determine columns and column types, and that this was a performance consideration? If so, this PR entirely undermines its purpose. I've been passing in excess of 500,000 rows at a time to insert_all() with these changes and although I'm sure the performance difference is measurable it's not really noticeable; given #145, I don't know that any performance advantages outweigh the problems doing it this way removes. What do you think about just dropping the argument and defaulting to the maximum batch_size permissible given SQLITE_MAX_VARS? Are there other reasons one might want to restrict batch_size that I've overlooked? I could open a new issue to discuss/implement this.

Of course the documentation will need to change again too if/when something is done about #147.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Handle case where subsequent records (after first batch) include extra columns 688668680  
688464181 https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688464181 https://api.github.com/repos/simonw/sqlite-utils/issues/149 MDEyOklzc3VlQ29tbWVudDY4ODQ2NDE4MQ== simonw 9599 2020-09-07T18:19:54Z 2020-09-07T18:19:54Z OWNER

Even though that table doesn't declare an integer primary key it does have a rowid column: https://github-to-sqlite.dogsheep.net/github?sql=select+rowid%2C+%5Bkey%5D%2C+name%2C+spdx_id%2C+url%2C+node_id+from+licenses+order+by+%5Bkey%5D+limit+101

| rowid | key | name | spdx_id | url | node_id | | --- | --- | --- | --- | --- | --- | | 9150 | apache-2.0 | Apache License 2.0 | Apache-2.0 | https://api.github.com/licenses/apache-2.0 | MDc6TGljZW5zZTI= | | 112 | bsd-3-clause | BSD 3-Clause "New" or "Revised" License | BSD-3-Clause | https://api.github.com/licenses/bsd-3-clause | MDc6TGljZW5zZTU= |

https://www.sqlite.org/rowidtable.html explains has this clue:

If the rowid is not aliased by INTEGER PRIMARY KEY then it is not persistent and might change. In particular the VACUUM command will change rowids for tables that do not declare an INTEGER PRIMARY KEY. Therefore, applications should not normally access the rowid directly, but instead use an INTEGER PRIMARY KEY.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
FTS table with 7 rows has _fts_docsize table with 9,141 rows 695319258  
688460865 https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688460865 https://api.github.com/repos/simonw/sqlite-utils/issues/149 MDEyOklzc3VlQ29tbWVudDY4ODQ2MDg2NQ== simonw 9599 2020-09-07T18:07:14Z 2020-09-07T18:07:14Z OWNER

Another likely culprit: licenses has a text primary key, so it's not using rowid: sql CREATE TABLE [licenses] ( [key] TEXT PRIMARY KEY, [name] TEXT, [spdx_id] TEXT, [url] TEXT, [node_id] TEXT );

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
FTS table with 7 rows has _fts_docsize table with 9,141 rows 695319258  
688460729 https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688460729 https://api.github.com/repos/simonw/sqlite-utils/issues/149 MDEyOklzc3VlQ29tbWVudDY4ODQ2MDcyOQ== simonw 9599 2020-09-07T18:06:44Z 2020-09-07T18:06:44Z OWNER

First posted on SQLite forum here but I'm pretty sure this is a bug in how sqlite-utils created those tables: https://sqlite.org/forum/forumpost/51aada1b45

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
FTS table with 7 rows has _fts_docsize table with 9,141 rows 695319258  
688434226 https://github.com/simonw/sqlite-utils/issues/148#issuecomment-688434226 https://api.github.com/repos/simonw/sqlite-utils/issues/148 MDEyOklzc3VlQ29tbWVudDY4ODQzNDIyNg== simonw 9599 2020-09-07T16:50:33Z 2020-09-07T16:50:33Z OWNER

This may be as easy as applying textwrap.dedent() to this: https://github.com/simonw/sqlite-utils/blob/0e62744da9a429093e3409575c1f881376b0361f/sqlite_utils/db.py#L778-L787

I could apply that to a few other queries in that code as well.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
More attractive indentation of created FTS table schema 695276328  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 491.758ms · About: github-to-sqlite