{"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1007637963", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1007637963, "node_id": "IC_kwDOCGYnMM48D1XL", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-07T18:30:13Z", "updated_at": "2022-01-07T18:30:13Z", "author_association": "OWNER", "body": "Annoyingly I use the word \"analyze\" to mean something else in the CLI - for these features:\r\n\r\n- #207 \r\n- #320\r\n\r\nthere's only one method with a similar name in the Python library though and that's this one:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/6e46b9913411682f3a3ec66f4d58886c1db8654b/sqlite_utils/db.py#L2904-L2906", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1008158616", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1008158616, "node_id": "IC_kwDOCGYnMM48F0eY", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T21:35:32Z", "updated_at": "2022-01-08T21:35:32Z", "author_association": "OWNER", "body": "Built a prototype in a branch, see #367.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009285627", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1009285627, "node_id": "IC_kwDOCGYnMM48KHn7", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T19:49:19Z", "updated_at": "2022-01-10T19:51:25Z", "author_association": "OWNER", "body": "Documentation for those two new methods: https://sqlite-utils.datasette.io/en/latest/python-api.html#optimizing-index-usage-with-analyze", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1007639860", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1007639860, "node_id": "IC_kwDOCGYnMM48D100", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-07T18:32:59Z", "updated_at": "2022-01-07T18:33:07Z", "author_association": "OWNER", "body": "From the SQLite docs:\r\n\r\n> If no arguments are given, all attached databases are analyzed. If a schema name is given as the argument, then all tables and indices in that one database are analyzed. If the argument is a table name, then only that table and the indices associated with that table are analyzed. If the argument is an index name, then only that one index is analyzed.\r\n\r\nSo I think this becomes two methods:\r\n\r\n- `db.analyze()` calls analyze on the whole database\r\n- `db.analyze(name_of_table_or_index)` for a specific named table or index\r\n- `table.analyze()` is a shortcut for `db.analyze(table.name)`", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009288898", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1009288898, "node_id": "IC_kwDOCGYnMM48KIbC", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T19:54:04Z", "updated_at": "2022-01-10T19:54:04Z", "author_association": "OWNER", "body": "Having browsed the API reference I think the methods that would benefit from an `analyze=True` parameter are:\r\n\r\n- `db.create_index`\r\n- `table.insert_all`\r\n- `table.upsert_all`\r\n- `table.delete_where`", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009273525", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1009273525, "node_id": "IC_kwDOCGYnMM48KEq1", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T19:32:39Z", "updated_at": "2022-01-10T19:32:39Z", "author_association": "OWNER", "body": "I'm going to implement the Python library methods based on the prototype:\r\n```diff\r\ncommit 650f97a08f29a688c530e5f6c9eedc9269ed7bdc\r\nAuthor: Simon Willison \r\nDate: Sat Jan 8 13:34:01 2022 -0800\r\n\r\n Initial prototype of .analyze(), refs #366\r\n\r\ndiff --git a/sqlite_utils/db.py b/sqlite_utils/db.py\r\nindex dfc4723..1348b4a 100644\r\n--- a/sqlite_utils/db.py\r\n+++ b/sqlite_utils/db.py\r\n@@ -923,6 +923,13 @@ class Database:\r\n \"Run a SQLite ``VACUUM`` against the database.\"\r\n self.execute(\"VACUUM;\")\r\n \r\n+ def analyze(self, name=None):\r\n+ \"Run ``ANALYZE`` against the entire database or a named table or index.\"\r\n+ sql = \"ANALYZE\"\r\n+ if name is not None:\r\n+ sql += \" [{}]\".format(name)\r\n+ self.execute(sql)\r\n+\r\n \r\n class Queryable:\r\n def exists(self) -> bool:\r\n@@ -2902,6 +2909,10 @@ class Table(Queryable):\r\n )\r\n return self\r\n \r\n+ def analyze(self):\r\n+ \"Run ANALYZE against this table\"\r\n+ self.db.analyze(self.name)\r\n+\r\n def analyze_column(\r\n self, column: str, common_limit: int = 10, value_truncate=None, total_rows=None\r\n ) -> \"ColumnDetails\":\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1008157132", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1008157132, "node_id": "IC_kwDOCGYnMM48F0HM", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T21:23:08Z", "updated_at": "2022-01-08T21:25:05Z", "author_association": "OWNER", "body": "Running `ANALYZE` creates a new visible table called `sqlite_stat1`: https://www.sqlite.org/fileformat.html#the_sqlite_stat1_table\r\n\r\nThis should be added to the default list of hidden tables in Datasette.\r\n\r\nIt looks something like this:\r\n\r\n| tbl | idx | stat |\r\n|---------------------------------|------------------------------------|-----------|\r\n| _counts | sqlite_autoindex__counts_1 | 5 1 |\r\n| global-power-plants_fts_config | global-power-plants_fts_config | 1 1 |\r\n| global-power-plants_fts_docsize | | 33643 |\r\n| global-power-plants_fts_idx | global-power-plants_fts_idx | 199 40 1 |\r\n| global-power-plants_fts_data | | 136 |\r\n| global-power-plants | \"global-power-plants_owner\" | 33643 4 |\r\n| global-power-plants | \"global-power-plants_country_long\" | 33643 202 |\r\n\r\n> In each such row, the sqlite_stat.stat column will be a string consisting of a list of integers followed by zero or more arguments. The first integer in this list is the approximate number of rows in the index. (The number of rows in the index is the same as the number of rows in the table, except for partial indexes.) The second integer is the approximate number of rows in the index that have the same value in the first column of the index. The third integer is the number number of rows in the index that have the same value for the first two columns. The N-th integer (for N>1) is the estimated average number of rows in the index which have the same value for the first N-1 columns. For a K-column index, there will be K+1 integers in the stat column. If the index is unique, then the last integer will be 1. ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1007641634", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1007641634, "node_id": "IC_kwDOCGYnMM48D2Qi", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-07T18:35:35Z", "updated_at": "2022-01-07T18:35:35Z", "author_association": "OWNER", "body": "Since the existing CLI feature is this:\r\n\r\n $ sqlite-utils analyze-tables github.db tags\r\n\r\nI can add `sqlite-utils analyze` to reflect the Python library method.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009508865", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1009508865, "node_id": "IC_kwDOCGYnMM48K-IB", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-11T01:08:51Z", "updated_at": "2022-01-11T01:08:51Z", "author_association": "OWNER", "body": "The Python methods are all done now, next step is the CLI options. I'll do those in a separate issue.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1009286373", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/366", "id": 1009286373, "node_id": "IC_kwDOCGYnMM48KHzl", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-10T19:50:22Z", "updated_at": "2022-01-10T19:50:22Z", "author_association": "OWNER", "body": "With respect to #365, I'm now thinking that having the ability to say \"... and then run ANALYZE\" could be useful for a bunch of Python methods. For example:\r\n\r\n```python\r\ndb[\"dogs\"].insert_all(list_of_dogs, analyze=True)\r\ndb[\"dogs\"].create_index([\"name\"], analyze=True)\r\n```\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096563265, "label": "Python library methods for calling ANALYZE"}, "performed_via_github_app": null}