home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where issue = 1560982210 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • cldellow 4
  • simonw 3
  • codecov[bot] 1

author_association 3

  • CONTRIBUTOR 4
  • OWNER 3
  • NONE 1

issue 1

  • array facet: don't materialize unnecessary columns · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1407733793 https://github.com/simonw/datasette/pull/2008#issuecomment-1407733793 https://api.github.com/repos/simonw/datasette/issues/2008 IC_kwDOBm6k_c5T6FAh simonw 9599 2023-01-29T18:17:40Z 2023-01-29T18:17:40Z OWNER

We don't have any performance tests yet - would be a useful thing to add, I've not built anything like that before (at least not in CI, I've always done as-hoc performance testing using something like Locust) so I don't have a great feel for how it could work.

Had an interesting conversation about this just now: https://fedi.simonwillison.net/@simon/109773800944614366

There's a risk that different runs will return different results due to the shared resource nature of GitHub Actions runners, but a good fix for that is to run comparative tests where you run the benchmark against e.g. both main and the incoming PR branch and report back on any differences.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
array facet: don't materialize unnecessary columns 1560982210  
1407716963 https://github.com/simonw/datasette/pull/2008#issuecomment-1407716963 https://api.github.com/repos/simonw/datasette/issues/2008 IC_kwDOBm6k_c5T6A5j cldellow 193185 2023-01-29T17:04:03Z 2023-01-29T17:04:03Z CONTRIBUTOR

Performance tests - I think most places don't have them as a formal gate enforced by CI. TypeScript and scalac seem to have tests that run to capture timings. The timings are included by a bot as a comment or build check, and also stored in a database so you can graph changes over time to spot regressions. Probably overkill for Datasette!

Window functions - oh, good point. Looks like Ubuntu shipped JSON1 support as far back as sqlite 3.11. I'll let this PR linger until there's a way to run against different SQLite versions. For now, I'm shipping this with datasette-ui-extras, since I think it's OK for a plugin to enforce a higher minimum requirement.

Tests - there actually did end up being test changes to capture the undercount bug of the current implementation, so the current implementation would fail against the new tests.

Perhaps a non-window function version could be written that uses random() instead of row_number() over () in order to get a unique key. It's technically not unique, but in practice, I imagine it'll work well.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
array facet: don't materialize unnecessary columns 1560982210  
1407568923 https://github.com/simonw/datasette/pull/2008#issuecomment-1407568923 https://api.github.com/repos/simonw/datasette/issues/2008 IC_kwDOBm6k_c5T5cwb simonw 9599 2023-01-29T05:47:36Z 2023-01-29T05:47:36Z OWNER

I don't know how/if you do automated tests for performance, so I haven't changed any of the tests.

We don't have any performance tests yet - would be a useful thing to add, I've not built anything like that before (at least not in CI, I've always done as-hoc performance testing using something like Locust) so I don't have a great feel for how it could work.

I see not having to change the tests at all for this change as a really positive sign. If you find any behaviour differences between this and the previous that's a sign we should add a mother test or two specifying the behaviour we want.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
array facet: don't materialize unnecessary columns 1560982210  
1407567753 https://github.com/simonw/datasette/pull/2008#issuecomment-1407567753 https://api.github.com/repos/simonw/datasette/issues/2008 IC_kwDOBm6k_c5T5ceJ simonw 9599 2023-01-29T05:39:54Z 2023-01-29T05:40:34Z OWNER

I absolutely love this performance boost - really nice find.

One concern: this will be the first time Datasette ships a core feature that uses window functions.

Window functions were added to SQLite in version 3.25.0 on 2018-09-15 - which means it's still very common for Datasette to run on versions that don't yet support them.

So I see two options: - Detect window function support and switch between the old implementation and this better, new one - Detect window functions and disable the facet-by-JSON feature entirely if they are missing

I like the first option a bit better.

This also leads to a tricky CI challenge: Datasette needs to be able to run its test suite against more than one SQLite version to confidently test this feature going forward.

I don't yet have a good GitHub Actions recipe for this, but I really need one - for sqlite-utils too.

Might be able to use this trick for that: https://til.simonwillison.net/sqlite/ld-preload

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
array facet: don't materialize unnecessary columns 1560982210  
1407471459 https://github.com/simonw/datasette/pull/2008#issuecomment-1407471459 https://api.github.com/repos/simonw/datasette/issues/2008 IC_kwDOBm6k_c5T5E9j codecov[bot] 22429695 2023-01-28T19:40:18Z 2023-01-29T04:55:39Z NONE

Codecov Report

Base: 92.11% // Head: 91.78% // Decreases project coverage by -0.34% :warning:

Coverage data is based on head (f529a30) compared to base (e4ebef0). Patch has no changes to coverable lines.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2008 +/- ## ========================================== - Coverage 92.11% 91.78% -0.34% ========================================== Files 38 39 +1 Lines 5555 5599 +44 ========================================== + Hits 5117 5139 +22 - Misses 438 460 +22 ``` | [Impacted Files](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison) | Coverage Δ | | |---|---|---| | [datasette/facets.py](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL2ZhY2V0cy5weQ==) | `91.84% <ø> (ø)` | | | [datasette/views/row.py](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3ZpZXdzL3Jvdy5weQ==) | `87.82% <0.00%> (ø)` | | | [datasette/views/table.py](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3ZpZXdzL3RhYmxlLnB5) | `92.57% <0.00%> (ø)` | | | [datasette/views/database.py](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3ZpZXdzL2RhdGFiYXNlLnB5) | `96.61% <0.00%> (ø)` | | | [datasette/utils/shutil\_backport.py](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3V0aWxzL3NodXRpbF9iYWNrcG9ydC5weQ==) | `9.09% <0.00%> (ø)` | | | [datasette/cli.py](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL2NsaS5weQ==) | `82.40% <0.00%> (+2.77%)` | :arrow_up: | | [datasette/plugins.py](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3BsdWdpbnMucHk=) | `85.29% <0.00%> (+2.94%)` | :arrow_up: | | [datasette/utils/asgi.py](https://codecov.io/gh/simonw/datasette/pull/2008?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison#diff-ZGF0YXNldHRlL3V0aWxzL2FzZ2kucHk=) | `93.12% <0.00%> (+3.05%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Simon+Willison)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
array facet: don't materialize unnecessary columns 1560982210  
1407561308 https://github.com/simonw/datasette/pull/2008#issuecomment-1407561308 https://api.github.com/repos/simonw/datasette/issues/2008 IC_kwDOBm6k_c5T5a5c cldellow 193185 2023-01-29T04:50:50Z 2023-01-29T04:50:50Z CONTRIBUTOR

I pushed a revised version which ends up being faster -- the example which currently takes 4 seconds now runs in 500ms.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
array facet: don't materialize unnecessary columns 1560982210  
1407558284 https://github.com/simonw/datasette/pull/2008#issuecomment-1407558284 https://api.github.com/repos/simonw/datasette/issues/2008 IC_kwDOBm6k_c5T5aKM cldellow 193185 2023-01-29T04:23:58Z 2023-01-29T04:24:27Z CONTRIBUTOR

Ack, this PR is broken. I see now that the inner.* is necessary for ensuring the correct count in the face of rows having duplicate values in views.

That fixes the overcounting, but I think can undercount when the rows have the same data, eg a view like:

sql SELECT '["bar"]' tags UNION ALL SELECT '["bar"]'

will produce a count of {"bar": 1 }, when it should be {"bar": 2}. In fact, this could apply in tables without primary keys, too.

If inner came from a base table that had a primary key or a rowid, we could use those column(s) to solve that case.

I guess a general solution would be to compute a window function so we have a distinct ID for each row. Will fiddle to see if I can get that working.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
array facet: don't materialize unnecessary columns 1560982210  
1407470429 https://github.com/simonw/datasette/pull/2008#issuecomment-1407470429 https://api.github.com/repos/simonw/datasette/issues/2008 IC_kwDOBm6k_c5T5Etd cldellow 193185 2023-01-28T19:34:29Z 2023-01-28T19:34:29Z CONTRIBUTOR

I don't know how/if you do automated tests for performance, so I haven't changed any of the tests.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
array facet: don't materialize unnecessary columns 1560982210  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 23.064ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows