{"html_url": "https://github.com/simonw/datasette/pull/2008#issuecomment-1407733793", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2008", "id": 1407733793, "node_id": "IC_kwDOBm6k_c5T6FAh", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-01-29T18:17:40Z", "updated_at": "2023-01-29T18:17:40Z", "author_association": "OWNER", "body": "> We don't have any performance tests yet - would be a useful thing to add, I've not built anything like that before (at least not in CI, I've always done as-hoc performance testing using something like Locust) so I don't have a great feel for how it could work.\r\n\r\nHad an interesting conversation about this just now: https://fedi.simonwillison.net/@simon/109773800944614366\r\n\r\nThere's a risk that different runs will return different results due to the shared resource nature of GitHub Actions runners, but a good fix for that is to run comparative tests where you run the benchmark against e.g. both `main` and the incoming PR branch and report back on any differences.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1560982210, "label": "array facet: don't materialize unnecessary columns"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/2008#issuecomment-1407716963", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2008", "id": 1407716963, "node_id": "IC_kwDOBm6k_c5T6A5j", "user": {"value": 193185, "label": "cldellow"}, "created_at": "2023-01-29T17:04:03Z", "updated_at": "2023-01-29T17:04:03Z", "author_association": "CONTRIBUTOR", "body": "Performance tests - I think most places don't have them as a formal gate enforced by CI. TypeScript and scalac seem to have tests that run to capture timings. The timings are included by a bot as a comment or build check, and also stored in a database so you can graph changes over time to spot regressions. Probably overkill for Datasette!\r\n\r\nWindow functions - oh, good point. Looks like Ubuntu shipped JSON1 support as far back as sqlite 3.11. I'll let this PR linger until there's a way to run against different SQLite versions. For now, I'm shipping this with `datasette-ui-extras`, since I think it's OK for a plugin to enforce a higher minimum requirement.\r\n\r\nTests - there actually did end up being test changes to capture the undercount bug of the current implementation, so the current implementation would fail against the new tests.\r\n\r\nPerhaps a non-window function version could be written that uses `random()` instead of `row_number() over ()` in order to get a unique key. It's technically not unique, but in practice, I imagine it'll work well.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1560982210, "label": "array facet: don't materialize unnecessary columns"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/2008#issuecomment-1407568923", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2008", "id": 1407568923, "node_id": "IC_kwDOBm6k_c5T5cwb", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-01-29T05:47:36Z", "updated_at": "2023-01-29T05:47:36Z", "author_association": "OWNER", "body": "> I don't know how/if you do automated tests for performance, so I haven't changed any of the tests.\r\n\r\nWe don't have any performance tests yet - would be a useful thing to add, I've not built anything like that before (at least not in CI, I've always done as-hoc performance testing using something like Locust) so I don't have a great feel for how it could work.\r\n\r\nI see not having to change the tests at all for this change as a really positive sign. If you find any behaviour differences between this and the previous that's a sign we should add a mother test or two specifying the behaviour we want.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1560982210, "label": "array facet: don't materialize unnecessary columns"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/2008#issuecomment-1407567753", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2008", "id": 1407567753, "node_id": "IC_kwDOBm6k_c5T5ceJ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2023-01-29T05:39:54Z", "updated_at": "2023-01-29T05:40:34Z", "author_association": "OWNER", "body": "I absolutely _love_ this performance boost - really nice find.\r\n\r\nOne concern: this will be the first time Datasette ships a core feature that uses window functions.\r\n\r\nWindow functions were added to SQLite in [version 3.25.0](https://www.sqlite.org/releaselog/3_25_0.html) on 2018-09-15 - which means it's still very common for Datasette to run on versions that don't yet support them.\r\n\r\nSo I see two options:\r\n- Detect window function support and switch between the old implementation and this better, new one\r\n- Detect window functions and disable the facet-by-JSON feature entirely if they are missing\r\n\r\nI like the first option a bit better.\r\n\r\nThis also leads to a tricky CI challenge: Datasette needs to be able to run its test suite against more than one SQLite version to confidently test this feature going forward.\r\n\r\nI don't yet have a good GitHub Actions recipe for this, but I _really_ need one - for `sqlite-utils` too.\r\n\r\nMight be able to use this trick for that: https://til.simonwillison.net/sqlite/ld-preload", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1560982210, "label": "array facet: don't materialize unnecessary columns"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/2008#issuecomment-1407471459", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2008", "id": 1407471459, "node_id": "IC_kwDOBm6k_c5T5E9j", "user": {"value": 22429695, "label": "codecov[bot]"}, "created_at": "2023-01-28T19:40:18Z", "updated_at": "2023-01-29T04:55:39Z", "author_association": "NONE", "body": "# [Codecov](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) Report\nBase: **92.11**% // Head: **91.78**% // Decreases project coverage by **`-0.34%`** :warning:\n> Coverage data is based on head [(`f529a30`)](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) compared to base [(`e4ebef0`)](https://codecov.io/gh/simonw/datasette/commit/e4ebef082de90db4e1b8527abc0d582b7ae0bc9d?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison).\n> Patch has no changes to coverable lines.\n\n
Additional details and impacted files\n\n\n```diff\n@@ Coverage Diff @@\n## main #2008 +/- ##\n==========================================\n- Coverage 92.11% 91.78% -0.34% \n==========================================\n Files 38 39 +1 \n Lines 5555 5599 +44 \n==========================================\n+ Hits 5117 5139 +22 \n- Misses 438 460 +22 \n```\n\n\n| [Impacted Files](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) | Coverage \u0394 | |\n|---|---|---|\n| [datasette/facets.py](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL2ZhY2V0cy5weQ==) | `91.84% <\u00f8> (\u00f8)` | |\n| [datasette/views/row.py](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3ZpZXdzL3Jvdy5weQ==) | `87.82% <0.00%> (\u00f8)` | |\n| [datasette/views/table.py](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3ZpZXdzL3RhYmxlLnB5) | `92.57% <0.00%> (\u00f8)` | |\n| [datasette/views/database.py](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3ZpZXdzL2RhdGFiYXNlLnB5) | `96.61% <0.00%> (\u00f8)` | |\n| [datasette/utils/shutil\\_backport.py](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3V0aWxzL3NodXRpbF9iYWNrcG9ydC5weQ==) | `9.09% <0.00%> (\u00f8)` | |\n| [datasette/cli.py](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL2NsaS5weQ==) | `82.40% <0.00%> (+2.77%)` | :arrow_up: |\n| [datasette/plugins.py](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3BsdWdpbnMucHk=) | `85.29% <0.00%> (+2.94%)` | :arrow_up: |\n| [datasette/utils/asgi.py](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3V0aWxzL2FzZ2kucHk=) | `93.12% <0.00%> (+3.05%)` | :arrow_up: |\n\nHelp us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison)\n\n
\n\n[:umbrella: View full report at Codecov](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). \n:loudspeaker: Do you have feedback about the report comment? [Let us know in this issue](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison).\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1560982210, "label": "array facet: don't materialize unnecessary columns"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/2008#issuecomment-1407561308", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2008", "id": 1407561308, "node_id": "IC_kwDOBm6k_c5T5a5c", "user": {"value": 193185, "label": "cldellow"}, "created_at": "2023-01-29T04:50:50Z", "updated_at": "2023-01-29T04:50:50Z", "author_association": "CONTRIBUTOR", "body": "I pushed a revised version which ends up being faster -- the example which currently takes 4 seconds now runs in 500ms.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1560982210, "label": "array facet: don't materialize unnecessary columns"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/2008#issuecomment-1407558284", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2008", "id": 1407558284, "node_id": "IC_kwDOBm6k_c5T5aKM", "user": {"value": 193185, "label": "cldellow"}, "created_at": "2023-01-29T04:23:58Z", "updated_at": "2023-01-29T04:24:27Z", "author_association": "CONTRIBUTOR", "body": "Ack, this PR is broken. I see now that the `inner.*` is necessary for ensuring the correct count in the face of rows having duplicate values in views.\r\n\r\nThat fixes the overcounting, but I think can undercount when the rows have the same data, eg a view like:\r\n\r\n```sql\r\nSELECT '[\"bar\"]' tags UNION ALL SELECT '[\"bar\"]'\r\n```\r\n\r\nwill produce a count of `{\"bar\": 1 }`, when it should be `{\"bar\": 2}`. In fact, this could apply in tables without primary keys, too.\r\n\r\nIf `inner` came from a base table that had a primary key or a rowid, we could use those column(s) to solve that case.\r\n\r\nI guess a general solution would be to compute a window function so we have a distinct ID for each row. Will fiddle to see if I can get that working.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1560982210, "label": "array facet: don't materialize unnecessary columns"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/pull/2008#issuecomment-1407470429", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/2008", "id": 1407470429, "node_id": "IC_kwDOBm6k_c5T5Etd", "user": {"value": 193185, "label": "cldellow"}, "created_at": "2023-01-28T19:34:29Z", "updated_at": "2023-01-28T19:34:29Z", "author_association": "CONTRIBUTOR", "body": "I don't know how/if you do automated tests for performance, so I haven't changed any of the tests.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1560982210, "label": "array facet: don't materialize unnecessary columns"}, "performed_via_github_app": null}