github: issue_comments: 3 rows where author_association = "OWNER", issue = 1217759117 and "updated_at" is on date 2022-04-29 sorted by updated

3 rows where author_association = "OWNER", issue = 1217759117 and "updated_at" is on date 2022-04-29 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1112889800	https://github.com/simonw/datasette/issues/1727#issuecomment-1112889800	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CVVnI	simonw 9599	2022-04-29T05:29:38Z	2022-04-29T05:29:38Z	OWNER	OK, I just got the most incredible result with that! I started up a container running `bash` like this, from my `datasette` checkout. I'm mapping port 8005 on my laptop to port 8001 inside the container because laptop port 8001 was already doing something else: `docker run -it --rm --name my-running-script -p 8005:8001 -v "$PWD":/usr/src/myapp \ -w /usr/src/myapp nogil/python bash` Then in `bash` I ran the following commands to install Datasette and its dependencies: `pip install -e '.[test]' pip install datasette-pretty-traces # For debug tracing` Then I started Datasette against my `github.db` database (from github-to-sqlite.dogsheep.net/github.db) like this: `datasette github.db -h 0.0.0.0 --setting trace_debug 1` I hit the following two URLs to compare the parallel v.s. not parallel implementations: `http://127.0.0.1:8005/github/issues?_facet=milestone&_facet=repo&_trace=1&_size=10` `http://127.0.0.1:8005/github/issues?_facet=milestone&_facet=repo&_trace=1&_size=10&_noparallel=1` And... the parallel one beat the non-parallel one decisively, on multiple page refreshes! Not parallel: 77ms Parallel: 47ms So yeah, I'm very confident this is a problem with the GIL. And I am absolutely stunned that @colesbury's fork ran Datasette (which has some reasonably tricky threading and async stuff going on) out of the box!	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117
1112879463	https://github.com/simonw/datasette/issues/1727#issuecomment-1112879463	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CVTFn	simonw 9599	2022-04-29T05:03:58Z	2022-04-29T05:03:58Z	OWNER	It would be really fun to try running this with the in-development `nogil` Python from https://github.com/colesbury/nogil There's a Docker container for it: https://hub.docker.com/r/nogil/python It suggests you can run something like this: `docker run -it --rm --name my-running-script -v "$PWD":/usr/src/myapp \ -w /usr/src/myapp nogil/python python your-daemon-or-script.py`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117
1112878955	https://github.com/simonw/datasette/issues/1727#issuecomment-1112878955	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CVS9r	simonw 9599	2022-04-29T05:02:40Z	2022-04-29T05:02:40Z	OWNER	Here's a very useful (recent) article about how the GIL works and how to think about it: https://pythonspeed.com/articles/python-gil/ - via https://lobste.rs/s/9hj80j/when_python_can_t_thread_deep_dive_into_gil From that article: For example, let's consider an extension module written in C or Rust that lets you talk to a PostgreSQL database server. Conceptually, handling a SQL query with this library will go through three steps: Deserialize from Python to the internal library representation. Since this will be reading Python objects, it needs to hold the GIL. Send the query to the database server, and wait for a response. This doesn't need the GIL. Convert the response into Python objects. This needs the GIL again. As you can see, how much parallelism you can get depends on how much time is spent in each step. If the bulk of time is spent in step 2, you'll get parallelism there. But if, for example, you run a `SELECT` and get a large number of rows back, the library will need to create many Python objects, and step 3 will have to hold GIL for a while. That explains what I'm seeing here. I'm pretty convinced now that the reason I'm not getting a performance boost from parallel queries is that there's more time spent in Python code assembling the results than in SQLite C code executing the query.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);