github: issue_comments: 15 rows where author_association = "OWNER", "created_at" is on date 2022-04-27, issue = 1217759117 and "updated_at" is on date 2022-04-27 sorted by updated

15 rows where author_association = "OWNER", "created_at" is on date 2022-04-27, issue = 1217759117 and "updated_at" is on date 2022-04-27 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1111558204	https://github.com/simonw/datasette/issues/1727#issuecomment-1111558204	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CQQg8	simonw 9599	2022-04-27T22:58:39Z	2022-04-27T22:58:39Z	OWNER	I should check my timing mechanism. Am I capturing the time taken just in SQLite or does it include time spent in Python crossing between async and threaded world and waiting for a thread pool worker to become available? That could explain the longer query times.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117
1111553029	https://github.com/simonw/datasette/issues/1727#issuecomment-1111553029	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CQPQF	simonw 9599	2022-04-27T22:48:21Z	2022-04-27T22:48:21Z	OWNER	I wonder if it would be worth exploring multiprocessing here.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117
1111551076	https://github.com/simonw/datasette/issues/1727#issuecomment-1111551076	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CQOxk	simonw 9599	2022-04-27T22:44:51Z	2022-04-27T22:45:04Z	OWNER	Really wild idea: what if I created three copies of the SQLite database file - as three separate file names - and then balanced the parallel queries across all these? Any chance that could avoid any mysterious locking issues?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117
1111535818	https://github.com/simonw/datasette/issues/1727#issuecomment-1111535818	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CQLDK	simonw 9599	2022-04-27T22:18:45Z	2022-04-27T22:18:45Z	OWNER	Another avenue: https://twitter.com/weargoggles/status/1519426289920270337 SQLite has its own mutexes to provide thread safety, which as another poster noted are out of play in multi process setups. Perhaps downgrading from the “serializable” to “multi-threaded” safety would be okay for Datasette? https://sqlite.org/c3ref/c_config_covering_index_scan.html#sqliteconfigmultithread Doesn't look like there's an obvious way to access that from Python via the `sqlite3` module though.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117
1111485722	https://github.com/simonw/datasette/issues/1727#issuecomment-1111485722	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CP-0a	simonw 9599	2022-04-27T21:08:20Z	2022-04-27T21:08:20Z	OWNER	Tried that and it didn't seem to make a difference either. I really need a much deeper view of what's going on here.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117
1111462442	https://github.com/simonw/datasette/issues/1727#issuecomment-1111462442	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CP5Iq	simonw 9599	2022-04-27T20:40:59Z	2022-04-27T20:42:49Z	OWNER	This looks VERY relevant: SQLite Shared-Cache Mode: SQLite includes a special "shared-cache" mode (disabled by default) intended for use in embedded servers. If shared-cache mode is enabled and a thread establishes multiple connections to the same database, the connections share a single data and schema cache. This can significantly reduce the quantity of memory and IO required by the system. Enabled as part of the URI filename: `ATTACH 'file:aux.db?cache=shared' AS aux;` Turns out I'm already using this for in-memory databases that have `.memory_name` set, but not (yet) for regular file-backed databases: https://github.com/simonw/datasette/blob/7a6654a253dee243518dc542ce4c06dbb0d0801d/datasette/database.py#L73-L75	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117
1111460068	https://github.com/simonw/datasette/issues/1727#issuecomment-1111460068	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CP4jk	simonw 9599	2022-04-27T20:38:32Z	2022-04-27T20:38:32Z	OWNER	WAL mode didn't seem to make a difference. I thought there was a chance it might help multiple read connections operate at the same time but it looks like it really does only matter for when writes are going on.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117
1111456500	https://github.com/simonw/datasette/issues/1727#issuecomment-1111456500	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CP3r0	simonw 9599	2022-04-27T20:36:01Z	2022-04-27T20:36:01Z	OWNER	Yeah all of this is pretty much assuming read-only connections. Datasette has a separate mechanism for ensuring that writes are executed one at a time against a dedicated connection from an in-memory queue: - https://github.com/simonw/datasette/issues/682	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117
1111442012	https://github.com/simonw/datasette/issues/1727#issuecomment-1111442012	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CP0Jc	simonw 9599	2022-04-27T20:19:00Z	2022-04-27T20:19:00Z	OWNER	Something worth digging into: are these parallel queries running against the same SQLite connection or are they each rubbing against a separate SQLite connection? Just realized I know the answer: they're running against separate SQLite connections, because that's how the time limit mechanism works: it installs a progress handler for each connection which terminates it after a set time. This means that if SQLite benefits from multiple threads using the same connection (due to shared caches or similar) then Datasette will not be seeing those benefits. It also means that if there's some mechanism within SQLite that penalizes you for having multiple parallel connections to a single file (just guessing here, maybe there's some kind of locking going on?) then Datasette will suffer those penalties. I should try seeing what happens with WAL mode enabled.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117
1111432375	https://github.com/simonw/datasette/issues/1727#issuecomment-1111432375	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CPxy3	simonw 9599	2022-04-27T20:07:57Z	2022-04-27T20:07:57Z	OWNER	Also useful: https://avi.im/blag/2021/fast-sqlite-inserts/ - from a tip on Twitter: https://twitter.com/ricardoanderegg/status/1519402047556235264	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117
1111431785	https://github.com/simonw/datasette/issues/1727#issuecomment-1111431785	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CPxpp	simonw 9599	2022-04-27T20:07:16Z	2022-04-27T20:07:16Z	OWNER	I think I need some much more in-depth tracing tricks for this. https://www.maartenbreddels.com/perf/jupyter/python/tracing/gil/2021/01/14/Tracing-the-Python-GIL.html looks relevant - uses the `perf` tool on Linux.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117
1111408273	https://github.com/simonw/datasette/issues/1727#issuecomment-1111408273	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CPr6R	simonw 9599	2022-04-27T19:40:51Z	2022-04-27T19:42:17Z	OWNER	Relevant: here's the code that sets up a Datasette SQLite connection: https://github.com/simonw/datasette/blob/7a6654a253dee243518dc542ce4c06dbb0d0801d/datasette/database.py#L73-L96 It's using `check_same_thread=False` - here's the Python docs on that: By default, check_same_thread is `True` and only the creating thread may use the connection. If set `False`, the returned connection may be shared across multiple threads. When using multiple threads with the same connection writing operations should be serialized by the user to avoid data corruption. This is why Datasette reserves a single connection for write queries and queues them up in memory, as described here.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117
1111390433	https://github.com/simonw/datasette/issues/1727#issuecomment-1111390433	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CPnjh	simonw 9599	2022-04-27T19:21:02Z	2022-04-27T19:21:02Z	OWNER	One weird thing: I noticed that in the parallel trace above the SQL query bars are wider. Mousover shows duration in ms, and I got 13ms for this query: `select message as value, count(*) as n from (` But in the `?_noparallel=1` version that some query took 2.97ms. Given those numbers though I would expect the overall page time to be MUCH worse for the parallel version - but the page load times are instead very close to each other, with parallel often winning. This is super-weird.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117
1111385875	https://github.com/simonw/datasette/issues/1727#issuecomment-1111385875	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CPmcT	simonw 9599	2022-04-27T19:16:57Z	2022-04-27T19:16:57Z	OWNER	I just remembered the `--setting num_sql_threads` option... which defaults to 3! https://github.com/simonw/datasette/blob/942411ef946e9a34a2094944d3423cddad27efd3/datasette/app.py#L109-L113 Would explain why the first trace never seems to show more than three SQL queries executing at once.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117
1111380282	https://github.com/simonw/datasette/issues/1727#issuecomment-1111380282	https://api.github.com/repos/simonw/datasette/issues/1727	IC_kwDOBm6k_c5CPlE6	simonw 9599	2022-04-27T19:10:27Z	2022-04-27T19:10:27Z	OWNER	Wrote more about that here: https://simonwillison.net/2022/Apr/27/parallel-queries/ Compare https://latest-with-plugins.datasette.io/github/commits?_facet=repo&_facet=committer&_trace=1 With the same thing but with parallel execution disabled: https://latest-with-plugins.datasette.io/github/commits?_facet=repo&_facet=committer&_trace=1&_noparallel=1 Those total page load time numbers are very similar. Is this parallel optimization worthwhile? Maybe it's only worth it on larger databases? Or maybe larger databases perform worse with this?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research: demonstrate if parallel SQL queries are worthwhile 1217759117

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);