home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 944870799

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app reactions draft state_reason
944870799 MDU6SXNzdWU5NDQ4NzA3OTk= 1394 Big performance boost on faceting: skip the inner order by 9599 closed 0     4 2021-07-14T23:32:29Z 2021-07-16T02:23:32Z 2021-07-15T00:05:50Z OWNER  

I just noticed something that could make for a huge performance improvement in faceting.

The default query used by Datasette when faceting looks like this: sql select country_long, count(*) from ( select * from [global-power-plants] order by rowid ) where country_long is not null group by country_long order by count(*) desc Here it takes 53ms: https://global-power-plants.datasettes.com/global-power-plants?sql=select%0D%0A++country_long%2C%0D%0A++count%28%29%0D%0Afrom+%28%0D%0A++select++from+%5Bglobal-power-plants%5D+order+by+rowid%0D%0A%29%0D%0Awhere%0D%0A++country_long+is+not+null%0D%0Agroup+by%0D%0A++country_long%0D%0Aorder+by%0D%0A++count%28*%29+desc

Note that there's a order by rowid in there which isn't necessary - the order on that inner query doesn't matter since we're grouping and counting.

I had assumed SQLite would optimize this away - but it turns out it doesn't! Consider this version of the query, with that pointless order by removed: select country_long, count(*) from ( select * from [global-power-plants] ) where country_long is not null group by country_long order by count(*) desc https://global-power-plants.datasettes.com/global-power-plants?sql=select%0D%0A++country_long%2C%0D%0A++count%28%29%0D%0Afrom+%28%0D%0A++select++from+%5Bglobal-power-plants%5D%0D%0A%29%0D%0Awhere%0D%0A++country_long+is+not+null%0D%0Agroup+by%0D%0A++country_long%0D%0Aorder+by%0D%0A++count%28*%29+desc runs in 7.2ms!

I tried this optimization on a table with 2.5m rows in it - without the optimization it took 5 seconds, with the optimization it took 450ms. So this is a very significant improvement!

107914493 issue    
{
    "url": "https://api.github.com/repos/simonw/datasette/issues/1394/reactions",
    "total_count": 2,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 4 rows from issue in issue_comments
Powered by Datasette · Queries took 1.53ms · About: github-to-sqlite