issue_comments
10 rows where issue_url = "https://api.github.com/repos/simonw/datasette/issues/1727", "updated_at" is on date 2022-04-28 and user = 9599 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
1112668411 | https://github.com/simonw/datasette/issues/1727#issuecomment-1112668411 | https://api.github.com/repos/simonw/datasette/issues/1727 | IC_kwDOBm6k_c5CUfj7 | simonw 9599 | 2022-04-28T21:25:34Z | 2022-04-28T21:25:44Z | OWNER | The two most promising theories at the moment, from here and Twitter and the SQLite forum, are:
A couple of ways to research the in-memory theory:
I need to do some more, better benchmarks using these different approaches. https://twitter.com/laurencerowe/status/1519780174560169987 also suggests:
I like that second idea a lot - I could use the mandelbrot example from https://www.sqlite.org/lang_with.html#outlandish_recursive_query_examples |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Research: demonstrate if parallel SQL queries are worthwhile 1217759117 | |
1111726586 | https://github.com/simonw/datasette/issues/1727#issuecomment-1111726586 | https://api.github.com/repos/simonw/datasette/issues/1727 | IC_kwDOBm6k_c5CQ5n6 | simonw 9599 | 2022-04-28T04:17:16Z | 2022-04-28T04:19:31Z | OWNER | I could experiment with the Code examples: https://cs.github.com/?scopeName=All+repos&scope=&q=run_in_executor+ProcessPoolExecutor |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Research: demonstrate if parallel SQL queries are worthwhile 1217759117 | |
1111725638 | https://github.com/simonw/datasette/issues/1727#issuecomment-1111725638 | https://api.github.com/repos/simonw/datasette/issues/1727 | IC_kwDOBm6k_c5CQ5ZG | simonw 9599 | 2022-04-28T04:15:15Z | 2022-04-28T04:15:15Z | OWNER | Useful theory from Keith Medcalf https://sqlite.org/forum/forumpost/e363c69d3441172e
So maybe this is a GIL thing. I should test with some expensive SQL queries (maybe big aggregations against large tables) and see if I can spot an improvement there. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Research: demonstrate if parallel SQL queries are worthwhile 1217759117 | |
1111699175 | https://github.com/simonw/datasette/issues/1727#issuecomment-1111699175 | https://api.github.com/repos/simonw/datasette/issues/1727 | IC_kwDOBm6k_c5CQy7n | simonw 9599 | 2022-04-28T03:19:48Z | 2022-04-28T03:20:08Z | OWNER | I ran The area on the right is the threads running the DB queries: Interactive version here: https://static.simonwillison.net/static/2022/datasette-parallel-profile.svg |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Research: demonstrate if parallel SQL queries are worthwhile 1217759117 | |
1111683539 | https://github.com/simonw/datasette/issues/1727#issuecomment-1111683539 | https://api.github.com/repos/simonw/datasette/issues/1727 | IC_kwDOBm6k_c5CQvHT | simonw 9599 | 2022-04-28T02:47:57Z | 2022-04-28T02:47:57Z | OWNER | Maybe this is the Python GIL after all? I've been hoping that the GIL won't be an issue because the So I've been hoping this means that SQLite code itself can run concurrently on multiple cores even when Python threads cannot. But maybe I'm misunderstanding how that works? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Research: demonstrate if parallel SQL queries are worthwhile 1217759117 | |
1111681513 | https://github.com/simonw/datasette/issues/1727#issuecomment-1111681513 | https://api.github.com/repos/simonw/datasette/issues/1727 | IC_kwDOBm6k_c5CQunp | simonw 9599 | 2022-04-28T02:44:26Z | 2022-04-28T02:44:26Z | OWNER | I could try |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Research: demonstrate if parallel SQL queries are worthwhile 1217759117 | |
1111661331 | https://github.com/simonw/datasette/issues/1727#issuecomment-1111661331 | https://api.github.com/repos/simonw/datasette/issues/1727 | IC_kwDOBm6k_c5CQpsT | simonw 9599 | 2022-04-28T02:07:31Z | 2022-04-28T02:07:31Z | OWNER | Asked on the SQLite forum about this here: https://sqlite.org/forum/forumpost/ffbfa9f38e |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Research: demonstrate if parallel SQL queries are worthwhile 1217759117 | |
1111602802 | https://github.com/simonw/datasette/issues/1727#issuecomment-1111602802 | https://api.github.com/repos/simonw/datasette/issues/1727 | IC_kwDOBm6k_c5CQbZy | simonw 9599 | 2022-04-28T00:21:35Z | 2022-04-28T00:21:35Z | OWNER | Tried this but I'm getting back an empty JSON array of traces at the bottom of the page most of the time (intermittently it works correctly): ```diff diff --git a/datasette/database.py b/datasette/database.py index ba594a8..d7f9172 100644 --- a/datasette/database.py +++ b/datasette/database.py @@ -7,7 +7,7 @@ import sys import threading import uuid -from .tracer import trace +from .tracer import trace, trace_child_tasks from .utils import ( detect_fts, detect_primary_keys, @@ -207,30 +207,31 @@ class Database: time_limit_ms = custom_time_limit
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Research: demonstrate if parallel SQL queries are worthwhile 1217759117 | |
1111597176 | https://github.com/simonw/datasette/issues/1727#issuecomment-1111597176 | https://api.github.com/repos/simonw/datasette/issues/1727 | IC_kwDOBm6k_c5CQaB4 | simonw 9599 | 2022-04-28T00:11:44Z | 2022-04-28T00:11:44Z | OWNER | Though it would be interesting to also have the trace reveal how much time is spent in the functions that wrap that core SQL - the stuff that is being measured at the moment. I have a hunch that this could help solve the over-arching performance mystery. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Research: demonstrate if parallel SQL queries are worthwhile 1217759117 | |
1111595319 | https://github.com/simonw/datasette/issues/1727#issuecomment-1111595319 | https://api.github.com/repos/simonw/datasette/issues/1727 | IC_kwDOBm6k_c5CQZk3 | simonw 9599 | 2022-04-28T00:09:45Z | 2022-04-28T00:11:01Z | OWNER | Here's where read queries are instrumented: https://github.com/simonw/datasette/blob/7a6654a253dee243518dc542ce4c06dbb0d0801d/datasette/database.py#L241-L242 So the instrumentation is actually capturing quite a bit of Python activity before it gets to SQLite: And then: Ideally I'd like that |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Research: demonstrate if parallel SQL queries are worthwhile 1217759117 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [issue] INTEGER REFERENCES [issues]([id]) , [performed_via_github_app] TEXT); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1