{"html_url": "https://github.com/simonw/sqlite-utils/issues/121#issuecomment-655898722", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/121", "id": 655898722, "node_id": "MDEyOklzc3VlQ29tbWVudDY1NTg5ODcyMg==", "user": {"value": 79913, "label": "tsibley"}, "created_at": "2020-07-09T04:53:08Z", "updated_at": "2020-07-09T04:53:08Z", "author_association": "CONTRIBUTOR", "body": "Yep, I agree that makes more sense for backwards compat and more casual use cases. I think it should be possible for the Database/Queryable methods to DTRT based on seeing if it's within a context-manager-managed transaction.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 652961907, "label": "Improved (and better documented) support for transactions"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/121#issuecomment-655652679", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/121", "id": 655652679, "node_id": "MDEyOklzc3VlQ29tbWVudDY1NTY1MjY3OQ==", "user": {"value": 79913, "label": "tsibley"}, "created_at": "2020-07-08T17:24:46Z", "updated_at": "2020-07-08T17:24:46Z", "author_association": "CONTRIBUTOR", "body": "Better transaction handling would be really great. Some of my thoughts on implementing better transaction discipline are in https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655239728.\r\n\r\nMy preferences:\r\n\r\n- Each CLI command should operate in a single transaction so that either the whole thing succeeds or the whole thing is rolled back. This avoids partially completed operations when an error occurs part way through processing. Partially completed operations are typically much harder to recovery from gracefully and may cause inconsistent data states.\r\n\r\n- The Python API should be transaction-agnostic and rely on the caller to coordinate transactions. Only the caller knows how individual insert, create, update, etc operations/methods should be bundled conceptually into transactions. When the caller is the CLI, for example, that bundling would be at the CLI command-level. Other callers might want to break up operations into multiple transactions. Transactions are usually most useful when controlled at the application-level (like logging configuration) instead of the library level. The library needs to provide an API that's conducive to transaction use, though.\r\n\r\n- The Python API should provide a context manager to provide consistent transactions handling with more useful defaults than Python's `sqlite3` module. The latter issues implicit `BEGIN` statements by default for most DML (`INSERT`, `UPDATE`, `DELETE`, \u2026 but not `SELECT`, I believe), but **not** DDL (`CREATE TABLE`, `DROP TABLE`, `CREATE VIEW`, \u2026). Notably, the `sqlite3` module doesn't issue the implicit `BEGIN` until the first DML statement. It _does not_ issue it when entering the `with conn` block, like other DBAPI2-compatible modules do. The `with conn` block for `sqlite3` only arranges to commit or rollback an existing transaction when exiting. Including DDL and `SELECT`s in transactions is important for operation consistency, though. There are several existing bugs.python.org tickets about this and future changes are in the works, but sqlite-utils can provide its own API sooner. sqlite-utils's `Database` class could itself be a context manager (built on the `sqlite3` connection context manager) which additionally issues an explicit `BEGIN` when entering. This would then let Python API callers do something like:\r\n\r\n```python\r\ndb = sqlite_utils.Database(path)\r\n\r\nwith db: # \u2190 BEGIN issued here by Database.__enter__\r\n db.insert(\u2026)\r\n db.create_view(\u2026)\r\n# \u2190 COMMIT/ROLLBACK issue here by sqlite3.connection.__exit__\r\n```", "reactions": "{\"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 652961907, "label": "Improved (and better documented) support for transactions"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655643078", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/118", "id": 655643078, "node_id": "MDEyOklzc3VlQ29tbWVudDY1NTY0MzA3OA==", "user": {"value": 79913, "label": "tsibley"}, "created_at": "2020-07-08T17:05:59Z", "updated_at": "2020-07-08T17:05:59Z", "author_association": "CONTRIBUTOR", "body": "> The only thing missing from this PR is updates to the documentation.\r\n\r\nAh, yes, thanks for this reminder! I've repushed with doc bits added.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 651844316, "label": "Add insert --truncate option"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655239728", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/118", "id": 655239728, "node_id": "MDEyOklzc3VlQ29tbWVudDY1NTIzOTcyOA==", "user": {"value": 79913, "label": "tsibley"}, "created_at": "2020-07-08T02:16:42Z", "updated_at": "2020-07-08T02:16:42Z", "author_association": "CONTRIBUTOR", "body": "I fixed my original oops by moving the `DELETE FROM $table` out of the chunking loop and repushed. I think this change can be considered in isolation from issues around transactions, which I discuss next.\r\n\r\nI wanted to make the DELETE + INSERT happen all in the same transaction so it was robust, but that was more complicated than I expected. The transaction handling in the Database/Table classes isn't systematic, and this poses big hurdles to making `Table.insert_all` (or other operations) consistent and robust in the face of errors.\r\n\r\nFor example, I wanted to do this (whitespace ignored in diff, so indentation change not highlighted):\r\n\r\n```diff\r\ndiff --git a/sqlite_utils/db.py b/sqlite_utils/db.py\r\nindex d6b9ecf..4107ceb 100644\r\n--- a/sqlite_utils/db.py\r\n+++ b/sqlite_utils/db.py\r\n@@ -1028,6 +1028,11 @@ class Table(Queryable):\r\n batch_size = max(1, min(batch_size, SQLITE_MAX_VARS // num_columns))\r\n self.last_rowid = None\r\n self.last_pk = None\r\n+ with self.db.conn:\r\n+ # Explicit BEGIN is necessary because Python's sqlite3 doesn't\r\n+ # issue implicit BEGINs for DDL, only DML. We mix DDL and DML\r\n+ # below and might execute DDL first, e.g. for table creation.\r\n+ self.db.conn.execute(\"BEGIN\")\r\n if truncate and self.exists():\r\n self.db.conn.execute(\"DELETE FROM [{}];\".format(self.name))\r\n for chunk in chunks(itertools.chain([first_record], records), batch_size):\r\n@@ -1038,7 +1043,11 @@ class Table(Queryable):\r\n # Use the first batch to derive the table names\r\n column_types = suggest_column_types(chunk)\r\n column_types.update(columns or {})\r\n- self.create(\r\n+ # Not self.create() because that is wrapped in its own\r\n+ # transaction and Python's sqlite3 doesn't support\r\n+ # nested transactions.\r\n+ self.db.create_table(\r\n+ self.name,\r\n column_types,\r\n pk,\r\n foreign_keys,\r\n@@ -1139,7 +1148,6 @@ class Table(Queryable):\r\n flat_values = list(itertools.chain(*values))\r\n queries_and_params = [(sql, flat_values)]\r\n \r\n- with self.db.conn:\r\n for query, params in queries_and_params:\r\n try:\r\n result = self.db.conn.execute(query, params)\r\n```\r\n\r\nbut that fails in tests because other methods call `insert/upsert/insert_all/upsert_all` in the middle of their transactions, so the BEGIN statement throws an error (no nested transactions allowed).\r\n\r\nStepping back, it would be nice to make the transaction handling systematic and predictable. One way to do this is to make the `sqlite_utils/db.py` code generally not begin or commit any transactions, and require the caller to do that instead. This lets the caller mix and match the Python API calls into transactions as appropriate (which is impossible for the API methods themselves to fully determine). Then, make `sqlite_utils/cli.py` begin and commit a transaction in each `@cli.command` function, making each command robust and consistent in the face of errors. The big change here, and why I didn't just submit a patch, is that it dramatically changes the Python API to _require_ callers to begin a transaction rather than just immediately calling methods.\r\n\r\nThere is also the caveat that for each transaction, an explicit `BEGIN` is also necessary so that DDL as well as DML (as well as `SELECT`s) are consistent and rolled back on error. There are several bugs.python.org discussions around this particular problem of DDL and some plans to make it better and consistent with DBAPI2, eventually. In the meantime, the sqlite-utils Database class could be a context manager which supports the incantations necessary to do proper transactions. This would still be a Python API change for callers but wouldn't expose them to the weirdness of the sqlite3's default transaction handling.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 651844316, "label": "Add insert --truncate option"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655052451", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/118", "id": 655052451, "node_id": "MDEyOklzc3VlQ29tbWVudDY1NTA1MjQ1MQ==", "user": {"value": 79913, "label": "tsibley"}, "created_at": "2020-07-07T18:45:23Z", "updated_at": "2020-07-07T18:45:23Z", "author_association": "CONTRIBUTOR", "body": "Ah, I see the problem. The truncate is inside a loop I didn't realize was there.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 651844316, "label": "Add insert --truncate option"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655018966", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/118", "id": 655018966, "node_id": "MDEyOklzc3VlQ29tbWVudDY1NTAxODk2Ng==", "user": {"value": 79913, "label": "tsibley"}, "created_at": "2020-07-07T17:41:06Z", "updated_at": "2020-07-07T17:41:06Z", "author_association": "CONTRIBUTOR", "body": "Hmm, while tests pass, this may not work as intended on larger datasets. Looking into it.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 651844316, "label": "Add insert --truncate option"}, "performed_via_github_app": null}