issue_comments
3 rows where author_association = "OWNER", issue = 1217759117 and "updated_at" is on date 2022-04-29 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date)
issue 1
- Research: demonstrate if parallel SQL queries are worthwhile · 3 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
1112889800 | https://github.com/simonw/datasette/issues/1727#issuecomment-1112889800 | https://api.github.com/repos/simonw/datasette/issues/1727 | IC_kwDOBm6k_c5CVVnI | simonw 9599 | 2022-04-29T05:29:38Z | 2022-04-29T05:29:38Z | OWNER | OK, I just got the most incredible result with that! I started up a container running
And... the parallel one beat the non-parallel one decisively, on multiple page refreshes! Not parallel: 77ms Parallel: 47ms So yeah, I'm very confident this is a problem with the GIL. And I am absolutely stunned that @colesbury's fork ran Datasette (which has some reasonably tricky threading and async stuff going on) out of the box! |
{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Research: demonstrate if parallel SQL queries are worthwhile 1217759117 | |
1112879463 | https://github.com/simonw/datasette/issues/1727#issuecomment-1112879463 | https://api.github.com/repos/simonw/datasette/issues/1727 | IC_kwDOBm6k_c5CVTFn | simonw 9599 | 2022-04-29T05:03:58Z | 2022-04-29T05:03:58Z | OWNER | It would be really fun to try running this with the in-development There's a Docker container for it: https://hub.docker.com/r/nogil/python It suggests you can run something like this:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Research: demonstrate if parallel SQL queries are worthwhile 1217759117 | |
1112878955 | https://github.com/simonw/datasette/issues/1727#issuecomment-1112878955 | https://api.github.com/repos/simonw/datasette/issues/1727 | IC_kwDOBm6k_c5CVS9r | simonw 9599 | 2022-04-29T05:02:40Z | 2022-04-29T05:02:40Z | OWNER | Here's a very useful (recent) article about how the GIL works and how to think about it: https://pythonspeed.com/articles/python-gil/ - via https://lobste.rs/s/9hj80j/when_python_can_t_thread_deep_dive_into_gil From that article:
That explains what I'm seeing here. I'm pretty convinced now that the reason I'm not getting a performance boost from parallel queries is that there's more time spent in Python code assembling the results than in SQLite C code executing the query. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Research: demonstrate if parallel SQL queries are worthwhile 1217759117 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [issue] INTEGER REFERENCES [issues]([id]) , [performed_via_github_app] TEXT); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1