home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

17 rows where "updated_at" is on date 2022-01-08 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 5

  • `--batch-size 1` doesn't seem to commit for every item 6
  • create-index should run analyze after creating index 6
  • Python library methods for calling ANALYZE 2
  • Add `sqlite_stat1`(-4) tables to hidden table list 2
  • introduce new option for datasette package to use a slim base image 1

user 2

  • simonw 12
  • fgregg 5

author_association 2

  • OWNER 12
  • CONTRIBUTOR 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1008166084 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008166084 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F2TE fgregg 536941 2022-01-08T22:32:47Z 2022-01-08T22:32:47Z CONTRIBUTOR

or using “ pragma optimize”

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008164786 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008164786 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F1-y fgregg 536941 2022-01-08T22:24:19Z 2022-01-08T22:24:19Z CONTRIBUTOR

the out-of-date scenario you describe could be addressed by automatically adding an analyze to the insert or convert commands if they implicate an index

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008164116 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008164116 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F10U fgregg 536941 2022-01-08T22:18:57Z 2022-01-08T22:18:57Z CONTRIBUTOR

the table with the query ran so bad was about 50k.

i think the scenario should not be worse than no stats.

i also did not know that sqlite was so different from postgres and needed an explicit analyze call.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008163050 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008163050 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F1jq simonw 9599 2022-01-08T22:10:51Z 2022-01-08T22:10:51Z OWNER

Is there a downside to having a sqlite_stat1 table if it has wildly incorrect statistics in it?

Imagine the following sequence of events:

  • User imports a few records, creating the table, using sqlite-utils insert
  • User runs sqlite-utils create-index ... which also creates and populates the sqlite_stat1 table
  • User runs insert again to populate several million new records

The user now has a database file with several million records and a statistics table that is wildly out of date, having been populated when they only had a few.

Will this result in surprisingly bad query performance compared to it that statistics table did not exist at all?

If so, I lean much harder towards ANALYZE as a strictly opt-in optimization, maybe with the --analyze option added to sqlite-utils insert top to help users opt in to updating their statistics after running big inserts.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008161965 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008161965 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F1St fgregg 536941 2022-01-08T22:02:56Z 2022-01-08T22:02:56Z CONTRIBUTOR

for options 2 and 3, i would worry about discoverablity.

in other db’s it is not necessary to explicitly call analyze for most indices. ie for postgres

The system regularly collects statistics on all of a table's columns. Newly-created non-expression indexes can immediately use these statistics to determine an index's usefulness.

i suppose i would propose raising a warning if the stats table is created that explains what is going on and informs users about a —no-analyze argument.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008158616 https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1008158616 https://api.github.com/repos/simonw/sqlite-utils/issues/366 IC_kwDOCGYnMM48F0eY simonw 9599 2022-01-08T21:35:32Z 2022-01-08T21:35:32Z OWNER

Built a prototype in a branch, see #367.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Python library methods for calling ANALYZE 1096563265  
1008158357 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008158357 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F0aV simonw 9599 2022-01-08T21:33:07Z 2022-01-08T21:33:07Z OWNER

The one thing that worries me a little bit about doing this by default is that it adds a surprising new table to the database - it may be confusing to users if they run create-index and their database suddenly has a new sqlite_stat1 table, see https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1008157132

Options here are:

  • Do it anyway. People can tolerate a surprise table appearing when they create an index.
  • Only run ANALYZE if the user says sqlite-utils create-index ... --analyze
  • Use the --analyze option, but also automatically run ANALYZE if they create an index and the database they are working with already has a sqlite_stat1 table

I'm currently leading towards that third option - @fgregg any thoughts?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008157998 https://github.com/simonw/datasette/issues/1587#issuecomment-1008157998 https://api.github.com/repos/simonw/datasette/issues/1587 IC_kwDOBm6k_c48F0Uu simonw 9599 2022-01-08T21:29:54Z 2022-01-08T21:29:54Z OWNER

Relevant code: https://github.com/simonw/datasette/blob/00a2895cd2dc42c63846216b36b2dc9f41170129/datasette/database.py#L339-L354

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add `sqlite_stat1`(-4) tables to hidden table list 1097040427  
1008157908 https://github.com/simonw/datasette/issues/1587#issuecomment-1008157908 https://api.github.com/repos/simonw/datasette/issues/1587 IC_kwDOBm6k_c48F0TU simonw 9599 2022-01-08T21:29:06Z 2022-01-08T21:29:06Z OWNER

Depending on the SQLite version (and compile options) that ran ANALYZE these can be called:

  • sqlite_stat1
  • sqlite_stat2
  • sqlite_stat3
  • sqlite_stat4
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add `sqlite_stat1`(-4) tables to hidden table list 1097040427  
1008157132 https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1008157132 https://api.github.com/repos/simonw/sqlite-utils/issues/366 IC_kwDOCGYnMM48F0HM simonw 9599 2022-01-08T21:23:08Z 2022-01-08T21:25:05Z OWNER

Running ANALYZE creates a new visible table called sqlite_stat1: https://www.sqlite.org/fileformat.html#the_sqlite_stat1_table

This should be added to the default list of hidden tables in Datasette.

It looks something like this:

| tbl | idx | stat | |---------------------------------|------------------------------------|-----------| | _counts | sqlite_autoindex__counts_1 | 5 1 | | global-power-plants_fts_config | global-power-plants_fts_config | 1 1 | | global-power-plants_fts_docsize | | 33643 | | global-power-plants_fts_idx | global-power-plants_fts_idx | 199 40 1 | | global-power-plants_fts_data | | 136 | | global-power-plants | "global-power-plants_owner" | 33643 4 | | global-power-plants | "global-power-plants_country_long" | 33643 202 |

In each such row, the sqlite_stat.stat column will be a string consisting of a list of integers followed by zero or more arguments. The first integer in this list is the approximate number of rows in the index. (The number of rows in the index is the same as the number of rows in the table, except for partial indexes.) The second integer is the approximate number of rows in the index that have the same value in the first column of the index. The third integer is the number number of rows in the index that have the same value for the first two columns. The N-th integer (for N>1) is the estimated average number of rows in the index which have the same value for the first N-1 columns. For a K-column index, there will be K+1 integers in the stat column. If the index is unique, then the last integer will be 1.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Python library methods for calling ANALYZE 1096563265  
1008155916 https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008155916 https://api.github.com/repos/simonw/sqlite-utils/issues/364 IC_kwDOCGYnMM48Fz0M simonw 9599 2022-01-08T21:16:46Z 2022-01-08T21:16:46Z OWNER

No, chunks() seems to work OK in the test I just added.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
`--batch-size 1` doesn't seem to commit for every item 1095570074  
1008154873 https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008154873 https://api.github.com/repos/simonw/sqlite-utils/issues/364 IC_kwDOCGYnMM48Fzj5 simonw 9599 2022-01-08T21:11:55Z 2022-01-08T21:11:55Z OWNER

I'm suspicious that the chunks() utility function may not be working correctly: ```pycon In [10]: [list(d) for d in list(chunks('abc', 5))] Out[10]: [['a'], ['b'], ['c']]

In [11]: [list(d) for d in list(chunks('abcdefghi', 5))] Out[11]: [['a'], ['b'], ['c'], ['d'], ['e'], ['f'], ['g'], ['h'], ['i']]

In [12]: [list(d) for d in list(chunks('abcdefghi', 3))] Out[12]: [['a'], ['b'], ['c'], ['d'], ['e'], ['f'], ['g'], ['h'], ['i']] ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
`--batch-size 1` doesn't seem to commit for every item 1095570074  
1008153586 https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008153586 https://api.github.com/repos/simonw/sqlite-utils/issues/364 IC_kwDOCGYnMM48FzPy simonw 9599 2022-01-08T21:06:15Z 2022-01-08T21:06:15Z OWNER

I added a print statement after for query, params in queries_and_params and confirmed that something in the code is waiting until 16 records are available to be inserted and then executing the inserts, even with --batch-size 1.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
`--batch-size 1` doesn't seem to commit for every item 1095570074  
1008151884 https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008151884 https://api.github.com/repos/simonw/sqlite-utils/issues/364 IC_kwDOCGYnMM48Fy1M simonw 9599 2022-01-08T20:59:21Z 2022-01-08T20:59:21Z OWNER

(That Heroku example doesn't record the timestamp, which limits its usefulness)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
`--batch-size 1` doesn't seem to commit for every item 1095570074  
1008143248 https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008143248 https://api.github.com/repos/simonw/sqlite-utils/issues/364 IC_kwDOCGYnMM48FwuQ simonw 9599 2022-01-08T20:34:12Z 2022-01-08T20:34:12Z OWNER

Built that tool: https://github.com/simonw/stream-delay and https://pypi.org/project/stream-delay/

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
`--batch-size 1` doesn't seem to commit for every item 1095570074  
1008129841 https://github.com/simonw/sqlite-utils/issues/364#issuecomment-1008129841 https://api.github.com/repos/simonw/sqlite-utils/issues/364 IC_kwDOCGYnMM48Ftcx simonw 9599 2022-01-08T20:04:42Z 2022-01-08T20:04:42Z OWNER

It would be easier to test this if I had a utility for streaming out a file one line at a time.

A few recipes for this in https://superuser.com/questions/526242/cat-file-to-terminal-at-particular-speed-of-lines-per-second - I'm going to build a quick stream-delay tool though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
`--batch-size 1` doesn't seem to commit for every item 1095570074  
1007844190 https://github.com/simonw/datasette/pull/1574#issuecomment-1007844190 https://api.github.com/repos/simonw/datasette/issues/1574 IC_kwDOBm6k_c48Ente fgregg 536941 2022-01-08T00:42:12Z 2022-01-08T00:42:12Z CONTRIBUTOR

is there a reason to not always use the slim option?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
introduce new option for datasette package to use a slim base image 1084193403  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 576.608ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows