{"html_url": "https://github.com/simonw/sqlite-utils/issues/139#issuecomment-683178570", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/139", "id": 683178570, "node_id": "MDEyOklzc3VlQ29tbWVudDY4MzE3ODU3MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-08-28T22:48:51Z", "updated_at": "2020-08-28T22:48:51Z", "author_association": "OWNER", "body": "Thanks @simonwiles, this is now released in 2.16.1: https://sqlite-utils.readthedocs.io/en/stable/changelog.html", "reactions": "{\"total_count\": 2, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 1, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 686978131, "label": "insert_all(..., alter=True) should work for new columns introduced after the first 100 records"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/139#issuecomment-682815377", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/139", "id": 682815377, "node_id": "MDEyOklzc3VlQ29tbWVudDY4MjgxNTM3Nw==", "user": {"value": 96218, "label": "simonwiles"}, "created_at": "2020-08-28T16:14:58Z", "updated_at": "2020-08-28T16:14:58Z", "author_association": "CONTRIBUTOR", "body": "Thanks! And yeah, I had updating the docs on my list too :) Will try to get to it this afternoon (budgeting time is fraught with uncertainty at the moment!).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 686978131, "label": "insert_all(..., alter=True) should work for new columns introduced after the first 100 records"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/139#issuecomment-682771226", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/139", "id": 682771226, "node_id": "MDEyOklzc3VlQ29tbWVudDY4Mjc3MTIyNg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-08-28T15:57:42Z", "updated_at": "2020-08-28T15:57:42Z", "author_association": "OWNER", "body": "That pull request should update this section of the documentation too:\r\n\r\n> If you have more than one record to insert, the insert_all() method is a much more efficient way of inserting them. Just like insert() it will automatically detect the columns that should be created, but it will inspect the first batch of 100 items to help decide what those column types should be.\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/ea87c2b943fdd162c42a900ac0aea5ecc2f4b9d9/docs/python-api.rst#L393", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 686978131, "label": "insert_all(..., alter=True) should work for new columns introduced after the first 100 records"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/139#issuecomment-682762911", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/139", "id": 682762911, "node_id": "MDEyOklzc3VlQ29tbWVudDY4Mjc2MjkxMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-08-28T15:54:57Z", "updated_at": "2020-08-28T15:55:20Z", "author_association": "OWNER", "body": "Here's a suggested test update:\r\n```diff\r\ndiff --git a/sqlite_utils/db.py b/sqlite_utils/db.py\r\nindex a8791c3..12fa2f2 100644\r\n--- a/sqlite_utils/db.py\r\n+++ b/sqlite_utils/db.py\r\n@@ -1074,6 +1074,13 @@ class Table(Queryable):\r\n all_columns = list(sorted(all_columns))\r\n if hash_id:\r\n all_columns.insert(0, hash_id)\r\n+ else:\r\n+ all_columns += [\r\n+ column\r\n+ for record in chunk\r\n+ for column in record\r\n+ if column not in all_columns\r\n+ ]\r\n validate_column_names(all_columns)\r\n first = False\r\n # values is the list of insert data that is passed to the\r\ndiff --git a/tests/test_create.py b/tests/test_create.py\r\nindex a84eb8d..3a7fafc 100644\r\n--- a/tests/test_create.py\r\n+++ b/tests/test_create.py\r\n@@ -707,13 +707,15 @@ def test_insert_thousands_using_generator(fresh_db):\r\n assert 10000 == fresh_db[\"test\"].count\r\n \r\n \r\n-def test_insert_thousands_ignores_extra_columns_after_first_100(fresh_db):\r\n+def test_insert_thousands_adds_extra_columns_after_first_100(fresh_db):\r\n+ # https://github.com/simonw/sqlite-utils/issues/139\r\n fresh_db[\"test\"].insert_all(\r\n [{\"i\": i, \"word\": \"word_{}\".format(i)} for i in range(100)]\r\n- + [{\"i\": 101, \"extra\": \"This extra column should cause an exception\"}]\r\n+ + [{\"i\": 101, \"extra\": \"Should trigger ALTER\"}],\r\n+ alter=True,\r\n )\r\n rows = fresh_db.execute_returning_dicts(\"select * from test where i = 101\")\r\n- assert [{\"i\": 101, \"word\": None}] == rows\r\n+ assert [{\"i\": 101, \"word\": None, \"extra\": \"Should trigger ALTER\"}] == rows\r\n \r\n \r\n def test_insert_ignore(fresh_db):\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 686978131, "label": "insert_all(..., alter=True) should work for new columns introduced after the first 100 records"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/139#issuecomment-682285212", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/139", "id": 682285212, "node_id": "MDEyOklzc3VlQ29tbWVudDY4MjI4NTIxMg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-08-28T02:12:51Z", "updated_at": "2020-08-28T02:12:51Z", "author_association": "OWNER", "body": "I'd be happy to accept a PR for this, provided it included updated unit tests that illustrate it working. I think this is a really good improvement.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 686978131, "label": "insert_all(..., alter=True) should work for new columns introduced after the first 100 records"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/139#issuecomment-682284908", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/139", "id": 682284908, "node_id": "MDEyOklzc3VlQ29tbWVudDY4MjI4NDkwOA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-08-28T02:11:40Z", "updated_at": "2020-08-28T02:11:40Z", "author_association": "OWNER", "body": "This is deliberate behaviour, but I'm not at all attached to it - you're right in pointing out that it's actually pretty unexpected.\r\n\r\nI'd be happy to change this behaviour so if you pass `alter=True` and then use `.insert_all()` on more than 100 rows it works as you would expect, instead of silently ignoring new columns past the first 100 rows. I don't expect that anyone would be depending on the current behaviour (ignore new columns after the first 100) such that this should be considered a backwards incompatible change.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 686978131, "label": "insert_all(..., alter=True) should work for new columns introduced after the first 100 records"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/139#issuecomment-682182178", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/139", "id": 682182178, "node_id": "MDEyOklzc3VlQ29tbWVudDY4MjE4MjE3OA==", "user": {"value": 96218, "label": "simonwiles"}, "created_at": "2020-08-27T20:46:18Z", "updated_at": "2020-08-27T20:46:18Z", "author_association": "CONTRIBUTOR", "body": "> I tried changing the batch_size argument to the total number of records, but it seems only to effect the number of rows that are committed at a time, and has no influence on this problem.\r\n\r\nSo the reason for this is that the `batch_size` for import is limited (of necessity) here: https://github.com/simonw/sqlite-utils/blob/main/sqlite_utils/db.py#L1048\r\n\r\nWith regard to the issue of ignoring columns, however, I made a fork and hacked a temporary fix that looks like this:\r\nhttps://github.com/simonwiles/sqlite-utils/commit/3901f43c6a712a1a3efc340b5b8d8fd0cbe8ee63\r\n\r\nIt doesn't seem to affect performance enormously (but I've not tested it thoroughly), and it now does what I need (and would expect, tbh), but it now fails the test here:\r\nhttps://github.com/simonw/sqlite-utils/blob/main/tests/test_create.py#L710-L716\r\n\r\nThe existence of this test suggests that `insert_all()` is behaving as intended, of course. It seems odd to me that this would be a desirable default behaviour (let alone the only behaviour), and its not very prominently flagged-up, either.\r\n\r\n@simonw is this something you'd be willing to look at a PR for? I assume you wouldn't want to change the default behaviour at this point, but perhaps an option could be provided, or at least a bit more of a warning in the docs. Are there oversights in the implementation that I've made?\r\n\r\nWould be grateful for your thoughts! Thanks!\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 686978131, "label": "insert_all(..., alter=True) should work for new columns introduced after the first 100 records"}, "performed_via_github_app": null}