{"html_url": "https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743707969", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/208", "id": 743707969, "node_id": "MDEyOklzc3VlQ29tbWVudDc0MzcwNzk2OQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-12-12T05:42:26Z", "updated_at": "2020-12-12T05:43:06Z", "author_association": "OWNER", "body": "Should truncate values in the least/most common JSON array to a sensible length, otherwise you end up with stuff like this:\r\n```json\r\n[\r\n [\r\n \"b'\\\\x00\\\\x05barry\\\\x03\\\\x01\\\\x02\\\\x00\\\\x00\\\\x03cat\\\\x03\\\\x01\\\\x03\\\\x00\\\\x00\\\\x03dog\\\\x08\\\\x01\\\\x01\\\\x01\\\\x03\\\\x00\\\\x01\\\\x03\\\\x00\\\\x00\\\\x07panther\\\\x05\\\\x01\\\\x01\\\\x02\\\\x02\\\\x00\\\\x01\\\\x03uma\\\\x05\\\\x02\\\\x01\\\\x02\\\\x02\\\\x00\\\\x00\\\\x04sara\\\\x05\\\\x02\\\\x01\\\\x01\\\\x02\\\\x00\\\\x00\\\\x05terry\\\\x08\\\\x01\\\\x01\\\\x01\\\\x02\\\\x00\\\\x01\\\\x02\\\\x00\\\\x00\\\\x06weasel\\\\x05\\\\x02\\\\x01\\\\x01\\\\x03\\\\x00'\",\r\n 1\r\n ]\r\n]\r\n```\r\nThis example also shows that binary values (like those in `_fts` tables) look a bit weird, but I think I'm OK with that since binary data can't be represented neatly in JSON anyway.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 763320133, "label": "sqlite-utils analyze-tables command and table.analyze_column() method"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743708080", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/208", "id": 743708080, "node_id": "MDEyOklzc3VlQ29tbWVudDc0MzcwODA4MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-12-12T05:43:45Z", "updated_at": "2020-12-12T05:43:45Z", "author_association": "OWNER", "body": "CLI output looks like this at the moment, which is bad:\r\n```\r\n % sqlite-utils analyze-tables ../datasette/fixtures.db facetable\r\n1/10: ColumnDetails(table='facetable', column='pk', total_rows=15, num_null=0, num_blank=0, num_distinct=15, most_common=None, least_common=None)\r\n2/10: ColumnDetails(table='facetable', column='created', total_rows=15, num_null=0, num_blank=0, num_distinct=4, most_common=[('2019-01-17 08:00:00', 4), ('2019-01-15 08:00:00', 4), ('2019-01-14 08:00:00', 4), ('2019-01-16 08:00:00', 3)], least_common=[('2019-01-16 08:00:00', 3), ('2019-01-14 08:00:00', 4), ('2019-01-15 08:00:00', 4), ('2019-01-17 08:00:00', 4)])\r\n3/10: ColumnDetails(table='facetable', column='planet_int', total_rows=15, num_null=0, num_blank=0, num_distinct=2, most_common=[(1, 14), (2, 1)], least_common=[(2, 1), (1, 14)])\r\n4/10: ColumnDetails(table='facetable', column='on_earth', total_rows=15, num_null=0, num_blank=0, num_distinct=2, most_common=[(1, 14), (0, 1)], least_common=[(0, 1), (1, 14)])\r\n5/10: ColumnDetails(table='facetable', column='state', total_rows=15, num_null=0, num_blank=0, num_distinct=3, most_common=[('CA', 10), ('MI', 4), ('MC', 1)], least_common=[('MC', 1), ('MI', 4), ('CA', 10)])\r\n6/10: ColumnDetails(table='facetable', column='city_id', total_rows=15, num_null=0, num_blank=0, num_distinct=4, most_common=[(1, 6), (3, 4), (2, 4), (4, 1)], least_common=[(4, 1), (2, 4), (3, 4), (1, 6)])\r\n7/10: ColumnDetails(table='facetable', column='neighborhood', total_rows=15, num_null=0, num_blank=0, num_distinct=14, most_common=[('Downtown', 2), ('Tenderloin', 1), ('SOMA', 1), ('Mission', 1), ('Mexicantown', 1), ('Los Feliz', 1), ('Koreatown', 1), ('Hollywood', 1), ('Hayes Valley', 1), ('Greektown', 1)], least_common=[('Arcadia Planitia', 1), ('Bernal Heights', 1), ('Corktown', 1), ('Dogpatch', 1), ('Greektown', 1), ('Hayes Valley', 1), ('Hollywood', 1), ('Koreatown', 1), ('Los Feliz', 1), ('Mexicantown', 1)])\r\n8/10: ColumnDetails(table='facetable', column='tags', total_rows=15, num_null=0, num_blank=0, num_distinct=3, most_common=[('[]', 13), ('[\"tag1\", \"tag3\"]', 1), ('[\"tag1\", \"tag2\"]', 1)], least_common=[('[\"tag1\", \"tag2\"]', 1), ('[\"tag1\", \"tag3\"]', 1), ('[]', 13)])\r\n9/10: ColumnDetails(table='facetable', column='complex_array', total_rows=15, num_null=0, num_blank=0, num_distinct=2, most_common=[('[]', 14), ('[{\"foo\": \"bar\"}]', 1)], least_common=[('[{\"foo\": \"bar\"}]', 1), ('[]', 14)])\r\n10/10: ColumnDetails(table='facetable', column='distinct_some_null', total_rows=15, num_null=13, num_blank=0, num_distinct=2, most_common=[(None, 13), ('two', 1), ('one', 1)], least_common=[('one', 1), ('two', 1), (None, 13)])\r\n(sqlite-utils) sqlite-utils % \r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 763320133, "label": "sqlite-utils analyze-tables command and table.analyze_column() method"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743708169", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/208", "id": 743708169, "node_id": "MDEyOklzc3VlQ29tbWVudDc0MzcwODE2OQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-12-12T05:44:46Z", "updated_at": "2020-12-12T05:44:46Z", "author_association": "OWNER", "body": "If there are less than ten values is it worth outputting them twice, once in `most_common` and then in reverse in `least_common`? Feels redundant - I think I should leave `least_common` empty in that case.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 763320133, "label": "sqlite-utils analyze-tables command and table.analyze_column() method"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743708325", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/208", "id": 743708325, "node_id": "MDEyOklzc3VlQ29tbWVudDc0MzcwODMyNQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-12-12T05:46:27Z", "updated_at": "2020-12-12T05:46:27Z", "author_association": "OWNER", "body": "It would be neat if you could optionally specify a subset of columns to analyze, using `-c` or `--column`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 763320133, "label": "sqlite-utils analyze-tables command and table.analyze_column() method"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743708524", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/208", "id": 743708524, "node_id": "MDEyOklzc3VlQ29tbWVudDc0MzcwODUyNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-12-12T05:48:20Z", "updated_at": "2020-12-12T05:48:32Z", "author_association": "OWNER", "body": "```\r\n% sqlite-utils analyze-tables ../datasette/fixtures.db facetable --column pk\r\n1/1: ColumnDetails(table='facetable', column='pk', total_rows=15, num_null=0, num_blank=0, num_distinct=15, most_common=None, least_common=None)\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 763320133, "label": "sqlite-utils analyze-tables command and table.analyze_column() method"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/208#issuecomment-743956666", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/208", "id": 743956666, "node_id": "MDEyOklzc3VlQ29tbWVudDc0Mzk1NjY2Ng==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-12-13T05:44:49Z", "updated_at": "2020-12-13T05:44:49Z", "author_association": "OWNER", "body": "Example output:\r\n```\r\n% sqlite-utils analyze-tables github.db tags \r\ntags.repo: (1/3)\r\n\r\n Total rows: 261\r\n Null rows: 0\r\n Blank rows: 0\r\n\r\n Distinct values: 14\r\n\r\n Most common:\r\n 88: 107914493\r\n 75: 140912432\r\n 27: 206156866\r\n 21: 207052882\r\n 17: 197431109\r\n 8: 197882382\r\n 5: 256834907\r\n 5: 205429375\r\n 4: 248903544\r\n 3: 206202864\r\n\r\n Least common:\r\n 1: 209590345\r\n 2: 206649770\r\n 2: 303218369\r\n 3: 206202864\r\n 3: 213286752\r\n 4: 248903544\r\n 5: 205429375\r\n 5: 256834907\r\n 8: 197882382\r\n 17: 197431109\r\n\r\ntags.name: (2/3)\r\n\r\n Total rows: 261\r\n Null rows: 0\r\n Blank rows: 0\r\n\r\n Distinct values: 175\r\n\r\n Most common:\r\n 10: 0.2\r\n 9: 0.1\r\n 7: 0.3\r\n 6: 0.4\r\n 5: 0.7\r\n 5: 0.5\r\n 5: 0.1a\r\n 4: 0.9\r\n 4: 0.8\r\n 4: 0.6\r\n\r\n Least common:\r\n 1: 0.1.1\r\n 1: 0.11.1\r\n 1: 0.1a2\r\n 1: 0.20.1\r\n 1: 0.21.1\r\n 1: 0.21.2\r\n 1: 0.21.3\r\n 1: 0.22\r\n 1: 0.22.1\r\n 1: 0.23\r\n\r\ntags.sha: (3/3)\r\n\r\n Total rows: 261\r\n Null rows: 0\r\n Blank rows: 0\r\n\r\n Distinct values: 261\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 763320133, "label": "sqlite-utils analyze-tables command and table.analyze_column() method"}, "performed_via_github_app": null}