github

This data as json, CSV

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	issue	performed_via_github_app
https://github.com/simonw/sqlite-utils/issues/121#issuecomment-655898722	https://api.github.com/repos/simonw/sqlite-utils/issues/121	655898722	MDEyOklzc3VlQ29tbWVudDY1NTg5ODcyMg==	79913	2020-07-09T04:53:08Z	2020-07-09T04:53:08Z	CONTRIBUTOR	Yep, I agree that makes more sense for backwards compat and more casual use cases. I think it should be possible for the Database/Queryable methods to DTRT based on seeing if it's within a context-manager-managed transaction.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	652961907
https://github.com/simonw/sqlite-utils/issues/121#issuecomment-655652679	https://api.github.com/repos/simonw/sqlite-utils/issues/121	655652679	MDEyOklzc3VlQ29tbWVudDY1NTY1MjY3OQ==	79913	2020-07-08T17:24:46Z	2020-07-08T17:24:46Z	CONTRIBUTOR	Better transaction handling would be really great. Some of my thoughts on implementing better transaction discipline are in https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655239728. My preferences: - Each CLI command should operate in a single transaction so that either the whole thing succeeds or the whole thing is rolled back. This avoids partially completed operations when an error occurs part way through processing. Partially completed operations are typically much harder to recovery from gracefully and may cause inconsistent data states. - The Python API should be transaction-agnostic and rely on the caller to coordinate transactions. Only the caller knows how individual insert, create, update, etc operations/methods should be bundled conceptually into transactions. When the caller is the CLI, for example, that bundling would be at the CLI command-level. Other callers might want to break up operations into multiple transactions. Transactions are usually most useful when controlled at the application-level (like logging configuration) instead of the library level. The library needs to provide an API that's conducive to transaction use, though. - The Python API should provide a context manager to provide consistent transactions handling with more useful defaults than Python's `sqlite3` module. The latter issues implicit `BEGIN` statements by default for most DML (`INSERT`, `UPDATE`, `DELETE`, … but not `SELECT`, I believe), but not DDL (`CREATE TABLE`, `DROP TABLE`, `CREATE VIEW`, …). Notably, the `sqlite3` module doesn't issue the implicit `BEGIN` until the first DML statement. It _does not_ issue it when entering the `with conn` block, like other DBAPI2-compatible modules do. The `with conn` block for `sqlite3` only arranges to commit or rollback an existing transaction when exiting. Including DDL and `SELECT`s in transactions is important for operation consistency, though. There are several existing bugs.python.org tickets about this and future changes are in the works, but sql…	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	652961907
https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655643078	https://api.github.com/repos/simonw/sqlite-utils/issues/118	655643078	MDEyOklzc3VlQ29tbWVudDY1NTY0MzA3OA==	79913	2020-07-08T17:05:59Z	2020-07-08T17:05:59Z	CONTRIBUTOR	> The only thing missing from this PR is updates to the documentation. Ah, yes, thanks for this reminder! I've repushed with doc bits added.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	651844316
https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655239728	https://api.github.com/repos/simonw/sqlite-utils/issues/118	655239728	MDEyOklzc3VlQ29tbWVudDY1NTIzOTcyOA==	79913	2020-07-08T02:16:42Z	2020-07-08T02:16:42Z	CONTRIBUTOR	I fixed my original oops by moving the `DELETE FROM $table` out of the chunking loop and repushed. I think this change can be considered in isolation from issues around transactions, which I discuss next. I wanted to make the DELETE + INSERT happen all in the same transaction so it was robust, but that was more complicated than I expected. The transaction handling in the Database/Table classes isn't systematic, and this poses big hurdles to making `Table.insert_all` (or other operations) consistent and robust in the face of errors. For example, I wanted to do this (whitespace ignored in diff, so indentation change not highlighted): ```diff diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py index d6b9ecf..4107ceb 100644 --- a/sqlite_utils/db.py +++ b/sqlite_utils/db.py @@ -1028,6 +1028,11 @@ class Table(Queryable): batch_size = max(1, min(batch_size, SQLITE_MAX_VARS // num_columns)) self.last_rowid = None self.last_pk = None + with self.db.conn: + # Explicit BEGIN is necessary because Python's sqlite3 doesn't + # issue implicit BEGINs for DDL, only DML. We mix DDL and DML + # below and might execute DDL first, e.g. for table creation. + self.db.conn.execute("BEGIN") if truncate and self.exists(): self.db.conn.execute("DELETE FROM [{}];".format(self.name)) for chunk in chunks(itertools.chain([first_record], records), batch_size): @@ -1038,7 +1043,11 @@ class Table(Queryable): # Use the first batch to derive the table names column_types = suggest_column_types(chunk) column_types.update(columns or {}) - self.create( + # Not self.create() because that is wrapped in its own + # transaction and Python's sqlite3 doesn't support + # nested transactions. + self.db.create_table( + …	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	651844316
https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655052451	https://api.github.com/repos/simonw/sqlite-utils/issues/118	655052451	MDEyOklzc3VlQ29tbWVudDY1NTA1MjQ1MQ==	79913	2020-07-07T18:45:23Z	2020-07-07T18:45:23Z	CONTRIBUTOR	Ah, I see the problem. The truncate is inside a loop I didn't realize was there.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	651844316
https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655018966	https://api.github.com/repos/simonw/sqlite-utils/issues/118	655018966	MDEyOklzc3VlQ29tbWVudDY1NTAxODk2Ng==	79913	2020-07-07T17:41:06Z	2020-07-07T17:41:06Z	CONTRIBUTOR	Hmm, while tests pass, this may not work as intended on larger datasets. Looking into it.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	651844316
https://github.com/simonw/datasette/issues/838#issuecomment-643083451	https://api.github.com/repos/simonw/datasette/issues/838	643083451	MDEyOklzc3VlQ29tbWVudDY0MzA4MzQ1MQ==	79913	2020-06-12T06:04:14Z	2020-06-12T06:04:14Z	NONE	Hmm, I haven't tried removing `ProxyPassReverse`, but it doesn't touch the HTML, which is the issue I'm seeing. You can read the [documentation here](https://httpd.apache.org/docs/2.4/mod/mod_proxy.html#proxypassreverse). `ProxyPassReverse` is a standard directive when proxying with Apache. I've used it dozens of times with other applications. Looking a little more at the code, I think the issue here is that the behaviour of `base_url` makes sense when Datasette is _mounted_ at a path within a larger application, but not when HTTP requests are being _proxied_ to it. In a _mount_ situation, it is perfectly fine to construct URLs reusing the domain and path from the request. In a _proxy_ situation, it never is, as the domain and path in the request are not the domain and path that the non-proxy client actually needs to use. That is, links which include the Apache → Datasette request origin, `localhost:8001`, instead of the browser → Apache request origin, `example.com`, will be broken. The tests you pointed to also reflect this in two ways: 1. They strip a leading `http://localhost`, allowing such URLs in the facet links to pass, but inclusion of that in a proxy situation would mean the URL is broken. 2. The test client emits direct ASGI events instead of actual proxied HTTP requests. The headers of these ASGI events don't reflect the way an HTTP proxy works; instead they pass through the original request path which contains `base_url`. This works because Datasette responds to requests equivalently at either `/…` or `/{base_url}/…`, which makes some sense in a _mount_ situation but is unconventional (albeit workable) for a proxied app. Apps that support being proxied automatically support being mounted, but apps that only support being mounted don't automatically support being proxied.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	637395097