{"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1009548580", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1009548580, "node_id": "IC_kwDOCGYnMM48LH0k", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-01-11T02:43:34Z", "updated_at": "2022-01-11T02:43:34Z", "author_association": "CONTRIBUTOR", "body": "thanks so much! always a pleasure to see how you work through these things", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1009521921", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1009521921, "node_id": "IC_kwDOCGYnMM48LBUB", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-11T01:37:53Z", "updated_at": "2022-01-11T01:37:53Z", "author_association": "OWNER", "body": "I decided to go with making this opt-in, mainly for consistency with the other places where I added this feature - see:\r\n- #379 \r\n- #366\r\n\r\nYou can now run the following:\r\n\r\n sqlite-utils create-index mydb.db mytable mycolumn --analyze\r\n\r\nAnd ``ANALYZE`` will be run on the index once it has been created.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008275546", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008275546, "node_id": "IC_kwDOCGYnMM48GRBa", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-01-09T11:01:15Z", "updated_at": "2022-01-09T13:37:51Z", "author_association": "CONTRIBUTOR", "body": "i don\u2019t want to be such a partisan for analyze, but the query planner deciding *not* to use an index based on information collected by analyze is not necessarily a bug, but could be the correct choice.\r\n\r\nthe original poster in that stack overflow doesn\u2019t say there\u2019s a performance regression ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008229839", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008229839, "node_id": "IC_kwDOCGYnMM48GF3P", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-09T04:51:44Z", "updated_at": "2022-01-09T04:51:44Z", "author_association": "OWNER", "body": "Found one report on Stack Overflow from 9 years ago of someone seeing broken performance after running `ANALYZE`, hard to say that's a trend and not a single weird edge-case though! https://stackoverflow.com/q/12947214/6083", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008163585", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008163585, "node_id": "IC_kwDOCGYnMM48F1sB", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T22:14:39Z", "updated_at": "2022-01-09T03:03:07Z", "author_association": "OWNER", "body": "The reason I'm hesitating on this is that I've not actually used ANALYZE at all in nearly five years of messing around with SQLite! So I'm nervous that there are surprise downsides I haven't thought of.\r\n\r\nMy hunch is that ANALYZE is only worth worrying about on much larger databases, in which case I'm OK supporting it as a thoroughly documented power-user feature rather than a default.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008166084", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008166084, "node_id": "IC_kwDOCGYnMM48F2TE", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-01-08T22:32:47Z", "updated_at": "2022-01-08T22:32:47Z", "author_association": "CONTRIBUTOR", "body": "or using \u201c pragma optimize\u201d", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008164786", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008164786, "node_id": "IC_kwDOCGYnMM48F1-y", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-01-08T22:24:19Z", "updated_at": "2022-01-08T22:24:19Z", "author_association": "CONTRIBUTOR", "body": "the out-of-date scenario you describe could be addressed by automatically adding an analyze to the insert or convert commands if they implicate an index", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008164116", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008164116, "node_id": "IC_kwDOCGYnMM48F10U", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-01-08T22:18:57Z", "updated_at": "2022-01-08T22:18:57Z", "author_association": "CONTRIBUTOR", "body": "the table with the query ran so bad was about 50k. \r\n\r\ni think the scenario should not be worse than no stats. \r\n\r\ni also did not know that sqlite was so different from postgres and needed an explicit analyze call.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008163050", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008163050, "node_id": "IC_kwDOCGYnMM48F1jq", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T22:10:51Z", "updated_at": "2022-01-08T22:10:51Z", "author_association": "OWNER", "body": "Is there a downside to having a `sqlite_stat1` table if it has wildly incorrect statistics in it?\r\n\r\nImagine the following sequence of events:\r\n\r\n- User imports a few records, creating the table, using `sqlite-utils insert`\r\n- User runs `sqlite-utils create-index ...` which also creates and populates the `sqlite_stat1` table\r\n- User runs `insert` again to populate several million new records\r\n\r\nThe user now has a database file with several million records and a statistics table that is wildly out of date, having been populated when they only had a few.\r\n\r\nWill this result in surprisingly bad query performance compared to it that statistics table did not exist at all?\r\n\r\nIf so, I lean much harder towards `ANALYZE` as a strictly opt-in optimization, maybe with the `--analyze` option added to `sqlite-utils insert` top to help users opt in to updating their statistics after running big inserts.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008161965", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008161965, "node_id": "IC_kwDOCGYnMM48F1St", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-01-08T22:02:56Z", "updated_at": "2022-01-08T22:02:56Z", "author_association": "CONTRIBUTOR", "body": "for options 2 and 3, i would worry about discoverablity. \r\n\r\nin other db\u2019s it is not necessary to explicitly call analyze for most indices. ie for postgres\r\n\r\n> The system regularly collects statistics on all of a table's columns. Newly-created non-expression indexes can immediately use these statistics to determine an index's usefulness.\r\n\r\ni suppose i would propose raising a warning if the stats table is created that explains what is going on and informs users about a \u2014no-analyze argument.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008158357", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1008158357, "node_id": "IC_kwDOCGYnMM48F0aV", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-08T21:33:07Z", "updated_at": "2022-01-08T21:33:07Z", "author_association": "OWNER", "body": "The one thing that worries me a little bit about doing this by default is that it adds a surprising new table to the database - it may be confusing to users if they run `create-index` and their database suddenly has a new `sqlite_stat1` table, see https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1008157132\r\n\r\nOptions here are:\r\n\r\n- Do it anyway. People can tolerate a surprise table appearing when they create an index.\r\n- Only run `ANALYZE` if the user says `sqlite-utils create-index ... --analyze`\r\n- Use the `--analyze` option, but also automatically run `ANALYZE` if they create an index and the database they are working with already has a `sqlite_stat1` table\r\n\r\nI'm currently leading towards that third option - @fgregg any thoughts?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007643254", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1007643254, "node_id": "IC_kwDOCGYnMM48D2p2", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-07T18:37:56Z", "updated_at": "2022-01-07T18:37:56Z", "author_association": "OWNER", "body": "Or I could leave off `--no-analyze` and tell people that if they want to add an index without running analyze they can execute the `CREATE INDEX` themselves.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007642831", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1007642831, "node_id": "IC_kwDOCGYnMM48D2jP", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-07T18:37:18Z", "updated_at": "2022-01-07T18:37:18Z", "author_association": "OWNER", "body": "After implementing #366 I can make it so `sqlite-utils create-index` automatically runs `db.analyze(index_name)` afterwards, maybe with a `--no-analyze` option in case anyone wants to opt out of that for specific performance reasons.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007636709", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1007636709, "node_id": "IC_kwDOCGYnMM48D1Dl", "user": {"value": 536941, "label": "fgregg"}, "created_at": "2022-01-07T18:28:33Z", "updated_at": "2022-01-07T18:29:43Z", "author_association": "CONTRIBUTOR", "body": "i added an index to one table with sqlite-utils, and then a query that used to take about 1 second started taking hundreds of seconds. \r\n\r\nrunning analyze got me back to sub second speed.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007634999", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1007634999, "node_id": "IC_kwDOCGYnMM48D0o3", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-07T18:26:22Z", "updated_at": "2022-01-07T18:26:22Z", "author_association": "OWNER", "body": "I've not used the `ANALYZE` feature in SQLite at all before. Should probably add Python library methods for it.\r\n\r\nAnnoyingly I use the word \"analyze\" to mean something else in the CLI - for these features:\r\n- #207 \r\n- #320", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007633376", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/365", "id": 1007633376, "node_id": "IC_kwDOCGYnMM48D0Pg", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-01-07T18:24:07Z", "updated_at": "2022-01-07T18:24:07Z", "author_association": "OWNER", "body": "Relevant documentation: https://www.sqlite.org/lang_analyze.html", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1096558279, "label": "create-index should run analyze after creating index"}, "performed_via_github_app": null}