github
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/simonw/sqlite-utils/issues/207#issuecomment-743966801 | https://api.github.com/repos/simonw/sqlite-utils/issues/207 | 743966801 | MDEyOklzc3VlQ29tbWVudDc0Mzk2NjgwMQ== | 9599 | 2020-12-13T07:25:23Z | 2020-12-13T07:25:23Z | OWNER | CLI documentation: https://sqlite-utils.readthedocs.io/en/latest/cli.html#analyzing-tables Python library documentation: https://sqlite-utils.readthedocs.io/en/latest/python-api.html#analyzing-a-column | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
763283616 | |
https://github.com/simonw/sqlite-utils/issues/207#issuecomment-743701697 | https://api.github.com/repos/simonw/sqlite-utils/issues/207 | 743701697 | MDEyOklzc3VlQ29tbWVudDc0MzcwMTY5Nw== | 9599 | 2020-12-12T04:39:51Z | 2020-12-12T04:39:51Z | OWNER | CLI could be: sqlite-utils analyze-tables To analyze all tables or: sqlite-utils analyze-tables table1 table2 To analyze specific tables. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
763283616 | |
https://github.com/simonw/sqlite-utils/issues/207#issuecomment-743701599 | https://api.github.com/repos/simonw/sqlite-utils/issues/207 | 743701599 | MDEyOklzc3VlQ29tbWVudDc0MzcwMTU5OQ== | 9599 | 2020-12-12T04:38:52Z | 2020-12-12T04:39:07Z | OWNER | I'll add a `table.analyze_column(column)` method which is used by the CLI tool - with a note that this is an unstable interface which may change in the future. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
763283616 | |
https://github.com/simonw/sqlite-utils/issues/207#issuecomment-743701422 | https://api.github.com/repos/simonw/sqlite-utils/issues/207 | 743701422 | MDEyOklzc3VlQ29tbWVudDc0MzcwMTQyMg== | 9599 | 2020-12-12T04:37:14Z | 2020-12-12T04:38:25Z | OWNER | Prototype: ```python from collections import namedtuple ColumnDetails = namedtuple("ColumnDetails", ("column", "num_null", "num_blank", "num_distinct", "most_common", "least_common")) def analyze_column(db, table, column, values=10): num_null = db.execute("select count(*) from [{}] where [{}] is null".format(table, column)).fetchone()[0] num_blank = db.execute("select count(*) from [{}] where [{}] = ''".format(table, column)).fetchone()[0] num_distinct = db.execute("select count(distinct [{}]) from [{}]".format(column, table)).fetchone()[0] most_common = None least_common = None if num_distinct != 1: most_common = [(r[0], r[1]) for r in db.execute( "select [{}], count(*) from [{}] group by [{}] order by count(*) desc limit ".format(column, table, column, values) ).fetchall()] if num_distinct <= values: # No need to run the query if it will just return the results in revers order least_common = most_common[::-1] else: least_common = [(r[0], r[1]) for r in db.execute( "select [{}], count(*) from [{}] group by [{}] order by count(*) limit {}".format(column, table, column, values) ).fetchall()] return ColumnDetails(column, num_null, num_blank, num_distinct, most_common, least_common) def analyze_table(db, table): for column in db[table].columns: details = analyze_column(db, table, column.name) print(details) ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
763283616 |