{"html_url": "https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688481317", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/146", "id": 688481317, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ4MTMxNw==", "user": {"value": 96218, "label": "simonwiles"}, "created_at": "2020-09-07T19:18:55Z", "updated_at": "2020-09-07T19:18:55Z", "author_association": "CONTRIBUTOR", "body": "Just force-pushed to update d042f9c with more formatting changes to satisfy `black==20.8b1` and pass the GitHub Actions \"Test\" workflow.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688668680, "label": "Handle case where subsequent records (after first batch) include extra columns"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688479163", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/146", "id": 688479163, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ3OTE2Mw==", "user": {"value": 96218, "label": "simonwiles"}, "created_at": "2020-09-07T19:10:33Z", "updated_at": "2020-09-07T19:11:57Z", "author_association": "CONTRIBUTOR", "body": "@simonw -- I've gone ahead updated the documentation to reflect the changes introduced in this PR. IMO it's ready to merge now.\r\n\r\nIn writing the documentation changes, I begin to wonder about the value and role of `batch_size` at all, tbh. May I assume it was originally intended to prevent using the entire row set to determine columns and column types, and that this was a performance consideration? If so, this PR entirely undermines its purpose. I've been passing in excess of 500,000 rows at a time to `insert_all()` with these changes and although I'm sure the performance difference is measurable it's not really noticeable; given #145, I don't know that any performance advantages outweigh the problems doing it this way removes. What do you think about just dropping the argument and defaulting to the maximum `batch_size` permissible given `SQLITE_MAX_VARS`? Are there other reasons one might want to restrict `batch_size` that I've overlooked? I could open a new issue to discuss/implement this.\r\n\r\nOf course the documentation will need to change again too if/when something is done about #147.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688668680, "label": "Handle case where subsequent records (after first batch) include extra columns"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/145#issuecomment-683382252", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/145", "id": 683382252, "node_id": "MDEyOklzc3VlQ29tbWVudDY4MzM4MjI1Mg==", "user": {"value": 96218, "label": "simonwiles"}, "created_at": "2020-08-30T06:27:25Z", "updated_at": "2020-08-30T06:27:52Z", "author_association": "CONTRIBUTOR", "body": "Note: had to adjust the test above because trying to exhaust a `SQLITE_MAX_VARIABLE_NUMBER` of 250000 in 99 records requires 2526 columns, and trips the ` \"Rows can have a maximum of {} columns\".format(SQLITE_MAX_VARS)` check even before it trips the default `SQLITE_MAX_COLUMN` value (2000).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 688659182, "label": "Bug when first record contains fewer columns than subsequent records"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/139#issuecomment-682815377", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/139", "id": 682815377, "node_id": "MDEyOklzc3VlQ29tbWVudDY4MjgxNTM3Nw==", "user": {"value": 96218, "label": "simonwiles"}, "created_at": "2020-08-28T16:14:58Z", "updated_at": "2020-08-28T16:14:58Z", "author_association": "CONTRIBUTOR", "body": "Thanks! And yeah, I had updating the docs on my list too :) Will try to get to it this afternoon (budgeting time is fraught with uncertainty at the moment!).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 686978131, "label": "insert_all(..., alter=True) should work for new columns introduced after the first 100 records"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/139#issuecomment-682182178", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/139", "id": 682182178, "node_id": "MDEyOklzc3VlQ29tbWVudDY4MjE4MjE3OA==", "user": {"value": 96218, "label": "simonwiles"}, "created_at": "2020-08-27T20:46:18Z", "updated_at": "2020-08-27T20:46:18Z", "author_association": "CONTRIBUTOR", "body": "> I tried changing the batch_size argument to the total number of records, but it seems only to effect the number of rows that are committed at a time, and has no influence on this problem.\r\n\r\nSo the reason for this is that the `batch_size` for import is limited (of necessity) here: https://github.com/simonw/sqlite-utils/blob/main/sqlite_utils/db.py#L1048\r\n\r\nWith regard to the issue of ignoring columns, however, I made a fork and hacked a temporary fix that looks like this:\r\nhttps://github.com/simonwiles/sqlite-utils/commit/3901f43c6a712a1a3efc340b5b8d8fd0cbe8ee63\r\n\r\nIt doesn't seem to affect performance enormously (but I've not tested it thoroughly), and it now does what I need (and would expect, tbh), but it now fails the test here:\r\nhttps://github.com/simonw/sqlite-utils/blob/main/tests/test_create.py#L710-L716\r\n\r\nThe existence of this test suggests that `insert_all()` is behaving as intended, of course. It seems odd to me that this would be a desirable default behaviour (let alone the only behaviour), and its not very prominently flagged-up, either.\r\n\r\n@simonw is this something you'd be willing to look at a PR for? I assume you wouldn't want to change the default behaviour at this point, but perhaps an option could be provided, or at least a bit more of a warning in the docs. Are there oversights in the implementation that I've made?\r\n\r\nWould be grateful for your thoughts! Thanks!\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 686978131, "label": "insert_all(..., alter=True) should work for new columns introduced after the first 100 records"}, "performed_via_github_app": null}