html_url,issue_url,id,node_id,user,user_label,created_at,updated_at,author_association,body,reactions,issue,issue_label,performed_via_github_app https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743707969,https://api.github.com/repos/simonw/sqlite-utils/issues/208,743707969,MDEyOklzc3VlQ29tbWVudDc0MzcwNzk2OQ==,9599,simonw,2020-12-12T05:42:26Z,2020-12-12T05:43:06Z,OWNER,"Should truncate values in the least/most common JSON array to a sensible length, otherwise you end up with stuff like this: ```json [ [ ""b'\\x00\\x05barry\\x03\\x01\\x02\\x00\\x00\\x03cat\\x03\\x01\\x03\\x00\\x00\\x03dog\\x08\\x01\\x01\\x01\\x03\\x00\\x01\\x03\\x00\\x00\\x07panther\\x05\\x01\\x01\\x02\\x02\\x00\\x01\\x03uma\\x05\\x02\\x01\\x02\\x02\\x00\\x00\\x04sara\\x05\\x02\\x01\\x01\\x02\\x00\\x00\\x05terry\\x08\\x01\\x01\\x01\\x02\\x00\\x01\\x02\\x00\\x00\\x06weasel\\x05\\x02\\x01\\x01\\x03\\x00'"", 1 ] ] ``` This example also shows that binary values (like those in `_fts` tables) look a bit weird, but I think I'm OK with that since binary data can't be represented neatly in JSON anyway.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",763320133,sqlite-utils analyze-tables command and table.analyze_column() method, https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743708080,https://api.github.com/repos/simonw/sqlite-utils/issues/208,743708080,MDEyOklzc3VlQ29tbWVudDc0MzcwODA4MA==,9599,simonw,2020-12-12T05:43:45Z,2020-12-12T05:43:45Z,OWNER,"CLI output looks like this at the moment, which is bad: ``` % sqlite-utils analyze-tables ../datasette/fixtures.db facetable 1/10: ColumnDetails(table='facetable', column='pk', total_rows=15, num_null=0, num_blank=0, num_distinct=15, most_common=None, least_common=None) 2/10: ColumnDetails(table='facetable', column='created', total_rows=15, num_null=0, num_blank=0, num_distinct=4, most_common=[('2019-01-17 08:00:00', 4), ('2019-01-15 08:00:00', 4), ('2019-01-14 08:00:00', 4), ('2019-01-16 08:00:00', 3)], least_common=[('2019-01-16 08:00:00', 3), ('2019-01-14 08:00:00', 4), ('2019-01-15 08:00:00', 4), ('2019-01-17 08:00:00', 4)]) 3/10: ColumnDetails(table='facetable', column='planet_int', total_rows=15, num_null=0, num_blank=0, num_distinct=2, most_common=[(1, 14), (2, 1)], least_common=[(2, 1), (1, 14)]) 4/10: ColumnDetails(table='facetable', column='on_earth', total_rows=15, num_null=0, num_blank=0, num_distinct=2, most_common=[(1, 14), (0, 1)], least_common=[(0, 1), (1, 14)]) 5/10: ColumnDetails(table='facetable', column='state', total_rows=15, num_null=0, num_blank=0, num_distinct=3, most_common=[('CA', 10), ('MI', 4), ('MC', 1)], least_common=[('MC', 1), ('MI', 4), ('CA', 10)]) 6/10: ColumnDetails(table='facetable', column='city_id', total_rows=15, num_null=0, num_blank=0, num_distinct=4, most_common=[(1, 6), (3, 4), (2, 4), (4, 1)], least_common=[(4, 1), (2, 4), (3, 4), (1, 6)]) 7/10: ColumnDetails(table='facetable', column='neighborhood', total_rows=15, num_null=0, num_blank=0, num_distinct=14, most_common=[('Downtown', 2), ('Tenderloin', 1), ('SOMA', 1), ('Mission', 1), ('Mexicantown', 1), ('Los Feliz', 1), ('Koreatown', 1), ('Hollywood', 1), ('Hayes Valley', 1), ('Greektown', 1)], least_common=[('Arcadia Planitia', 1), ('Bernal Heights', 1), ('Corktown', 1), ('Dogpatch', 1), ('Greektown', 1), ('Hayes Valley', 1), ('Hollywood', 1), ('Koreatown', 1), ('Los Feliz', 1), ('Mexicantown', 1)]) 8/10: ColumnDetails(table='facetable', column='tags', total_rows=15, num_null=0, num_blank=0, num_distinct=3, most_common=[('[]', 13), ('[""tag1"", ""tag3""]', 1), ('[""tag1"", ""tag2""]', 1)], least_common=[('[""tag1"", ""tag2""]', 1), ('[""tag1"", ""tag3""]', 1), ('[]', 13)]) 9/10: ColumnDetails(table='facetable', column='complex_array', total_rows=15, num_null=0, num_blank=0, num_distinct=2, most_common=[('[]', 14), ('[{""foo"": ""bar""}]', 1)], least_common=[('[{""foo"": ""bar""}]', 1), ('[]', 14)]) 10/10: ColumnDetails(table='facetable', column='distinct_some_null', total_rows=15, num_null=13, num_blank=0, num_distinct=2, most_common=[(None, 13), ('two', 1), ('one', 1)], least_common=[('one', 1), ('two', 1), (None, 13)]) (sqlite-utils) sqlite-utils % ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",763320133,sqlite-utils analyze-tables command and table.analyze_column() method, https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743708169,https://api.github.com/repos/simonw/sqlite-utils/issues/208,743708169,MDEyOklzc3VlQ29tbWVudDc0MzcwODE2OQ==,9599,simonw,2020-12-12T05:44:46Z,2020-12-12T05:44:46Z,OWNER,"If there are less than ten values is it worth outputting them twice, once in `most_common` and then in reverse in `least_common`? Feels redundant - I think I should leave `least_common` empty in that case.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",763320133,sqlite-utils analyze-tables command and table.analyze_column() method, https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743708325,https://api.github.com/repos/simonw/sqlite-utils/issues/208,743708325,MDEyOklzc3VlQ29tbWVudDc0MzcwODMyNQ==,9599,simonw,2020-12-12T05:46:27Z,2020-12-12T05:46:27Z,OWNER,"It would be neat if you could optionally specify a subset of columns to analyze, using `-c` or `--column`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",763320133,sqlite-utils analyze-tables command and table.analyze_column() method, https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743708524,https://api.github.com/repos/simonw/sqlite-utils/issues/208,743708524,MDEyOklzc3VlQ29tbWVudDc0MzcwODUyNA==,9599,simonw,2020-12-12T05:48:20Z,2020-12-12T05:48:32Z,OWNER,"``` % sqlite-utils analyze-tables ../datasette/fixtures.db facetable --column pk 1/1: ColumnDetails(table='facetable', column='pk', total_rows=15, num_null=0, num_blank=0, num_distinct=15, most_common=None, least_common=None) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",763320133,sqlite-utils analyze-tables command and table.analyze_column() method, https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743956666,https://api.github.com/repos/simonw/sqlite-utils/issues/208,743956666,MDEyOklzc3VlQ29tbWVudDc0Mzk1NjY2Ng==,9599,simonw,2020-12-13T05:44:49Z,2020-12-13T05:44:49Z,OWNER,"Example output: ``` % sqlite-utils analyze-tables github.db tags tags.repo: (1/3) Total rows: 261 Null rows: 0 Blank rows: 0 Distinct values: 14 Most common: 88: 107914493 75: 140912432 27: 206156866 21: 207052882 17: 197431109 8: 197882382 5: 256834907 5: 205429375 4: 248903544 3: 206202864 Least common: 1: 209590345 2: 206649770 2: 303218369 3: 206202864 3: 213286752 4: 248903544 5: 205429375 5: 256834907 8: 197882382 17: 197431109 tags.name: (2/3) Total rows: 261 Null rows: 0 Blank rows: 0 Distinct values: 175 Most common: 10: 0.2 9: 0.1 7: 0.3 6: 0.4 5: 0.7 5: 0.5 5: 0.1a 4: 0.9 4: 0.8 4: 0.6 Least common: 1: 0.1.1 1: 0.11.1 1: 0.1a2 1: 0.20.1 1: 0.21.1 1: 0.21.2 1: 0.21.3 1: 0.22 1: 0.22.1 1: 0.23 tags.sha: (3/3) Total rows: 261 Null rows: 0 Blank rows: 0 Distinct values: 261 ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",763320133,sqlite-utils analyze-tables command and table.analyze_column() method,