{"html_url": "https://github.com/simonw/sqlite-utils/issues/207#issuecomment-743701422", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/207", "id": 743701422, "node_id": "MDEyOklzc3VlQ29tbWVudDc0MzcwMTQyMg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-12-12T04:37:14Z", "updated_at": "2020-12-12T04:38:25Z", "author_association": "OWNER", "body": "Prototype:\r\n```python\r\nfrom collections import namedtuple\r\n\r\nColumnDetails = namedtuple(\"ColumnDetails\", (\"column\", \"num_null\", \"num_blank\", \"num_distinct\", \"most_common\", \"least_common\"))\r\n\r\ndef analyze_column(db, table, column, values=10):\r\n num_null = db.execute(\"select count(*) from [{}] where [{}] is null\".format(table, column)).fetchone()[0]\r\n num_blank = db.execute(\"select count(*) from [{}] where [{}] = ''\".format(table, column)).fetchone()[0]\r\n num_distinct = db.execute(\"select count(distinct [{}]) from [{}]\".format(column, table)).fetchone()[0]\r\n most_common = None\r\n least_common = None\r\n if num_distinct != 1:\r\n most_common = [(r[0], r[1]) for r in db.execute(\r\n \"select [{}], count(*) from [{}] group by [{}] order by count(*) desc limit \".format(column, table, column, values)\r\n ).fetchall()]\r\n if num_distinct <= values:\r\n # No need to run the query if it will just return the results in revers order\r\n least_common = most_common[::-1]\r\n else:\r\n least_common = [(r[0], r[1]) for r in db.execute(\r\n \"select [{}], count(*) from [{}] group by [{}] order by count(*) limit {}\".format(column, table, column, values)\r\n ).fetchall()]\r\n return ColumnDetails(column, num_null, num_blank, num_distinct, most_common, least_common)\r\n\r\n\r\ndef analyze_table(db, table):\r\n for column in db[table].columns:\r\n details = analyze_column(db, table, column.name)\r\n print(details)\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 763283616, "label": "sqlite-utils analyze-tables command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/207#issuecomment-743701599", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/207", "id": 743701599, "node_id": "MDEyOklzc3VlQ29tbWVudDc0MzcwMTU5OQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-12-12T04:38:52Z", "updated_at": "2020-12-12T04:39:07Z", "author_association": "OWNER", "body": "I'll add a `table.analyze_column(column)` method which is used by the CLI tool - with a note that this is an unstable interface which may change in the future.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 763283616, "label": "sqlite-utils analyze-tables command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/207#issuecomment-743701697", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/207", "id": 743701697, "node_id": "MDEyOklzc3VlQ29tbWVudDc0MzcwMTY5Nw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-12-12T04:39:51Z", "updated_at": "2020-12-12T04:39:51Z", "author_association": "OWNER", "body": "CLI could be:\r\n\r\n sqlite-utils analyze-tables\r\n\r\nTo analyze all tables or:\r\n\r\n sqlite-utils analyze-tables table1 table2\r\n\r\nTo analyze specific tables.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 763283616, "label": "sqlite-utils analyze-tables command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/207#issuecomment-743966801", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/207", "id": 743966801, "node_id": "MDEyOklzc3VlQ29tbWVudDc0Mzk2NjgwMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-12-13T07:25:23Z", "updated_at": "2020-12-13T07:25:23Z", "author_association": "OWNER", "body": "CLI documentation: https://sqlite-utils.readthedocs.io/en/latest/cli.html#analyzing-tables\r\n\r\nPython library documentation: https://sqlite-utils.readthedocs.io/en/latest/python-api.html#analyzing-a-column", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 763283616, "label": "sqlite-utils analyze-tables command"}, "performed_via_github_app": null}