home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where author_association = "OWNER", "updated_at" is on date 2020-12-12 and user = 9599 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 3

  • sqlite-utils analyze-tables command and table.analyze_column() method 5
  • sqlite-utils analyze-tables command 3
  • "Stream all rows" is not at all obvious 2

user 1

  • simonw · 10 ✖

author_association 1

  • OWNER · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
743913004 https://github.com/simonw/datasette/issues/1142#issuecomment-743913004 https://api.github.com/repos/simonw/datasette/issues/1142 MDEyOklzc3VlQ29tbWVudDc0MzkxMzAwNA== simonw 9599 2020-12-12T22:17:46Z 2020-12-12T22:17:46Z OWNER

You're actually choosing between two options here: the 100 rows you can see on the screen, or the x,000 rows that match the current query.

Maybe a radio box would be more obvious?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"Stream all rows" is not at all obvious 763361458  
743912875 https://github.com/simonw/datasette/issues/1142#issuecomment-743912875 https://api.github.com/repos/simonw/datasette/issues/1142 MDEyOklzc3VlQ29tbWVudDc0MzkxMjg3NQ== simonw 9599 2020-12-12T22:16:38Z 2020-12-12T22:16:38Z OWNER

Yeah, maybe with the number of rows to make it completely clear. Include all 2,455 rows perhaps.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"Stream all rows" is not at all obvious 763361458  
743708524 https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743708524 https://api.github.com/repos/simonw/sqlite-utils/issues/208 MDEyOklzc3VlQ29tbWVudDc0MzcwODUyNA== simonw 9599 2020-12-12T05:48:20Z 2020-12-12T05:48:32Z OWNER

% sqlite-utils analyze-tables ../datasette/fixtures.db facetable --column pk 1/1: ColumnDetails(table='facetable', column='pk', total_rows=15, num_null=0, num_blank=0, num_distinct=15, most_common=None, least_common=None)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils analyze-tables command and table.analyze_column() method 763320133  
743708325 https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743708325 https://api.github.com/repos/simonw/sqlite-utils/issues/208 MDEyOklzc3VlQ29tbWVudDc0MzcwODMyNQ== simonw 9599 2020-12-12T05:46:27Z 2020-12-12T05:46:27Z OWNER

It would be neat if you could optionally specify a subset of columns to analyze, using -c or --column.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils analyze-tables command and table.analyze_column() method 763320133  
743708169 https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743708169 https://api.github.com/repos/simonw/sqlite-utils/issues/208 MDEyOklzc3VlQ29tbWVudDc0MzcwODE2OQ== simonw 9599 2020-12-12T05:44:46Z 2020-12-12T05:44:46Z OWNER

If there are less than ten values is it worth outputting them twice, once in most_common and then in reverse in least_common? Feels redundant - I think I should leave least_common empty in that case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils analyze-tables command and table.analyze_column() method 763320133  
743708080 https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743708080 https://api.github.com/repos/simonw/sqlite-utils/issues/208 MDEyOklzc3VlQ29tbWVudDc0MzcwODA4MA== simonw 9599 2020-12-12T05:43:45Z 2020-12-12T05:43:45Z OWNER

CLI output looks like this at the moment, which is bad: % sqlite-utils analyze-tables ../datasette/fixtures.db facetable 1/10: ColumnDetails(table='facetable', column='pk', total_rows=15, num_null=0, num_blank=0, num_distinct=15, most_common=None, least_common=None) 2/10: ColumnDetails(table='facetable', column='created', total_rows=15, num_null=0, num_blank=0, num_distinct=4, most_common=[('2019-01-17 08:00:00', 4), ('2019-01-15 08:00:00', 4), ('2019-01-14 08:00:00', 4), ('2019-01-16 08:00:00', 3)], least_common=[('2019-01-16 08:00:00', 3), ('2019-01-14 08:00:00', 4), ('2019-01-15 08:00:00', 4), ('2019-01-17 08:00:00', 4)]) 3/10: ColumnDetails(table='facetable', column='planet_int', total_rows=15, num_null=0, num_blank=0, num_distinct=2, most_common=[(1, 14), (2, 1)], least_common=[(2, 1), (1, 14)]) 4/10: ColumnDetails(table='facetable', column='on_earth', total_rows=15, num_null=0, num_blank=0, num_distinct=2, most_common=[(1, 14), (0, 1)], least_common=[(0, 1), (1, 14)]) 5/10: ColumnDetails(table='facetable', column='state', total_rows=15, num_null=0, num_blank=0, num_distinct=3, most_common=[('CA', 10), ('MI', 4), ('MC', 1)], least_common=[('MC', 1), ('MI', 4), ('CA', 10)]) 6/10: ColumnDetails(table='facetable', column='city_id', total_rows=15, num_null=0, num_blank=0, num_distinct=4, most_common=[(1, 6), (3, 4), (2, 4), (4, 1)], least_common=[(4, 1), (2, 4), (3, 4), (1, 6)]) 7/10: ColumnDetails(table='facetable', column='neighborhood', total_rows=15, num_null=0, num_blank=0, num_distinct=14, most_common=[('Downtown', 2), ('Tenderloin', 1), ('SOMA', 1), ('Mission', 1), ('Mexicantown', 1), ('Los Feliz', 1), ('Koreatown', 1), ('Hollywood', 1), ('Hayes Valley', 1), ('Greektown', 1)], least_common=[('Arcadia Planitia', 1), ('Bernal Heights', 1), ('Corktown', 1), ('Dogpatch', 1), ('Greektown', 1), ('Hayes Valley', 1), ('Hollywood', 1), ('Koreatown', 1), ('Los Feliz', 1), ('Mexicantown', 1)]) 8/10: ColumnDetails(table='facetable', column='tags', total_rows=15, num_null=0, num_blank=0, num_distinct=3, most_common=[('[]', 13), ('["tag1", "tag3"]', 1), ('["tag1", "tag2"]', 1)], least_common=[('["tag1", "tag2"]', 1), ('["tag1", "tag3"]', 1), ('[]', 13)]) 9/10: ColumnDetails(table='facetable', column='complex_array', total_rows=15, num_null=0, num_blank=0, num_distinct=2, most_common=[('[]', 14), ('[{"foo": "bar"}]', 1)], least_common=[('[{"foo": "bar"}]', 1), ('[]', 14)]) 10/10: ColumnDetails(table='facetable', column='distinct_some_null', total_rows=15, num_null=13, num_blank=0, num_distinct=2, most_common=[(None, 13), ('two', 1), ('one', 1)], least_common=[('one', 1), ('two', 1), (None, 13)]) (sqlite-utils) sqlite-utils %

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils analyze-tables command and table.analyze_column() method 763320133  
743707969 https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743707969 https://api.github.com/repos/simonw/sqlite-utils/issues/208 MDEyOklzc3VlQ29tbWVudDc0MzcwNzk2OQ== simonw 9599 2020-12-12T05:42:26Z 2020-12-12T05:43:06Z OWNER

Should truncate values in the least/most common JSON array to a sensible length, otherwise you end up with stuff like this: json [ [ "b'\\x00\\x05barry\\x03\\x01\\x02\\x00\\x00\\x03cat\\x03\\x01\\x03\\x00\\x00\\x03dog\\x08\\x01\\x01\\x01\\x03\\x00\\x01\\x03\\x00\\x00\\x07panther\\x05\\x01\\x01\\x02\\x02\\x00\\x01\\x03uma\\x05\\x02\\x01\\x02\\x02\\x00\\x00\\x04sara\\x05\\x02\\x01\\x01\\x02\\x00\\x00\\x05terry\\x08\\x01\\x01\\x01\\x02\\x00\\x01\\x02\\x00\\x00\\x06weasel\\x05\\x02\\x01\\x01\\x03\\x00'", 1 ] ] This example also shows that binary values (like those in _fts tables) look a bit weird, but I think I'm OK with that since binary data can't be represented neatly in JSON anyway.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils analyze-tables command and table.analyze_column() method 763320133  
743701697 https://github.com/simonw/sqlite-utils/issues/207#issuecomment-743701697 https://api.github.com/repos/simonw/sqlite-utils/issues/207 MDEyOklzc3VlQ29tbWVudDc0MzcwMTY5Nw== simonw 9599 2020-12-12T04:39:51Z 2020-12-12T04:39:51Z OWNER

CLI could be:

sqlite-utils analyze-tables

To analyze all tables or:

sqlite-utils analyze-tables table1 table2

To analyze specific tables.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils analyze-tables command 763283616  
743701599 https://github.com/simonw/sqlite-utils/issues/207#issuecomment-743701599 https://api.github.com/repos/simonw/sqlite-utils/issues/207 MDEyOklzc3VlQ29tbWVudDc0MzcwMTU5OQ== simonw 9599 2020-12-12T04:38:52Z 2020-12-12T04:39:07Z OWNER

I'll add a table.analyze_column(column) method which is used by the CLI tool - with a note that this is an unstable interface which may change in the future.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils analyze-tables command 763283616  
743701422 https://github.com/simonw/sqlite-utils/issues/207#issuecomment-743701422 https://api.github.com/repos/simonw/sqlite-utils/issues/207 MDEyOklzc3VlQ29tbWVudDc0MzcwMTQyMg== simonw 9599 2020-12-12T04:37:14Z 2020-12-12T04:38:25Z OWNER

Prototype: ```python from collections import namedtuple

ColumnDetails = namedtuple("ColumnDetails", ("column", "num_null", "num_blank", "num_distinct", "most_common", "least_common"))

def analyze_column(db, table, column, values=10): num_null = db.execute("select count() from [{}] where [{}] is null".format(table, column)).fetchone()[0] num_blank = db.execute("select count() from [{}] where [{}] = ''".format(table, column)).fetchone()[0] num_distinct = db.execute("select count(distinct [{}]) from [{}]".format(column, table)).fetchone()[0] most_common = None least_common = None if num_distinct != 1: most_common = [(r[0], r[1]) for r in db.execute( "select [{}], count() from [{}] group by [{}] order by count() desc limit ".format(column, table, column, values) ).fetchall()] if num_distinct <= values: # No need to run the query if it will just return the results in revers order least_common = most_common[::-1] else: least_common = [(r[0], r[1]) for r in db.execute( "select [{}], count() from [{}] group by [{}] order by count() limit {}".format(column, table, column, values) ).fetchall()] return ColumnDetails(column, num_null, num_blank, num_distinct, most_common, least_common)

def analyze_table(db, table): for column in db[table].columns: details = analyze_column(db, table, column.name) print(details) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
sqlite-utils analyze-tables command 763283616  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 638.18ms · About: github-to-sqlite