github
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688481317 | https://api.github.com/repos/simonw/sqlite-utils/issues/146 | 688481317 | MDEyOklzc3VlQ29tbWVudDY4ODQ4MTMxNw== | 96218 | 2020-09-07T19:18:55Z | 2020-09-07T19:18:55Z | CONTRIBUTOR | Just force-pushed to update d042f9c with more formatting changes to satisfy `black==20.8b1` and pass the GitHub Actions "Test" workflow. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
688668680 | |
https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688479163 | https://api.github.com/repos/simonw/sqlite-utils/issues/146 | 688479163 | MDEyOklzc3VlQ29tbWVudDY4ODQ3OTE2Mw== | 96218 | 2020-09-07T19:10:33Z | 2020-09-07T19:11:57Z | CONTRIBUTOR | @simonw -- I've gone ahead updated the documentation to reflect the changes introduced in this PR. IMO it's ready to merge now. In writing the documentation changes, I begin to wonder about the value and role of `batch_size` at all, tbh. May I assume it was originally intended to prevent using the entire row set to determine columns and column types, and that this was a performance consideration? If so, this PR entirely undermines its purpose. I've been passing in excess of 500,000 rows at a time to `insert_all()` with these changes and although I'm sure the performance difference is measurable it's not really noticeable; given #145, I don't know that any performance advantages outweigh the problems doing it this way removes. What do you think about just dropping the argument and defaulting to the maximum `batch_size` permissible given `SQLITE_MAX_VARS`? Are there other reasons one might want to restrict `batch_size` that I've overlooked? I could open a new issue to discuss/implement this. Of course the documentation will need to change again too if/when something is done about #147. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
688668680 | |
https://github.com/simonw/sqlite-utils/issues/145#issuecomment-683382252 | https://api.github.com/repos/simonw/sqlite-utils/issues/145 | 683382252 | MDEyOklzc3VlQ29tbWVudDY4MzM4MjI1Mg== | 96218 | 2020-08-30T06:27:25Z | 2020-08-30T06:27:52Z | CONTRIBUTOR | Note: had to adjust the test above because trying to exhaust a `SQLITE_MAX_VARIABLE_NUMBER` of 250000 in 99 records requires 2526 columns, and trips the ` "Rows can have a maximum of {} columns".format(SQLITE_MAX_VARS)` check even before it trips the default `SQLITE_MAX_COLUMN` value (2000). | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
688659182 | |
https://github.com/simonw/sqlite-utils/issues/139#issuecomment-682815377 | https://api.github.com/repos/simonw/sqlite-utils/issues/139 | 682815377 | MDEyOklzc3VlQ29tbWVudDY4MjgxNTM3Nw== | 96218 | 2020-08-28T16:14:58Z | 2020-08-28T16:14:58Z | CONTRIBUTOR | Thanks! And yeah, I had updating the docs on my list too :) Will try to get to it this afternoon (budgeting time is fraught with uncertainty at the moment!). | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
686978131 | |
https://github.com/simonw/sqlite-utils/issues/139#issuecomment-682182178 | https://api.github.com/repos/simonw/sqlite-utils/issues/139 | 682182178 | MDEyOklzc3VlQ29tbWVudDY4MjE4MjE3OA== | 96218 | 2020-08-27T20:46:18Z | 2020-08-27T20:46:18Z | CONTRIBUTOR | > I tried changing the batch_size argument to the total number of records, but it seems only to effect the number of rows that are committed at a time, and has no influence on this problem. So the reason for this is that the `batch_size` for import is limited (of necessity) here: https://github.com/simonw/sqlite-utils/blob/main/sqlite_utils/db.py#L1048 With regard to the issue of ignoring columns, however, I made a fork and hacked a temporary fix that looks like this: https://github.com/simonwiles/sqlite-utils/commit/3901f43c6a712a1a3efc340b5b8d8fd0cbe8ee63 It doesn't seem to affect performance enormously (but I've not tested it thoroughly), and it now does what I need (and would expect, tbh), but it now fails the test here: https://github.com/simonw/sqlite-utils/blob/main/tests/test_create.py#L710-L716 The existence of this test suggests that `insert_all()` is behaving as intended, of course. It seems odd to me that this would be a desirable default behaviour (let alone the only behaviour), and its not very prominently flagged-up, either. @simonw is this something you'd be willing to look at a PR for? I assume you wouldn't want to change the default behaviour at this point, but perhaps an option could be provided, or at least a bit more of a warning in the docs. Are there oversights in the implementation that I've made? Would be grateful for your thoughts! Thanks! | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
686978131 |