github

This data as json, CSV

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	issue	performed_via_github_app
https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622587177	https://api.github.com/repos/simonw/sqlite-utils/issues/103	622587177	MDEyOklzc3VlQ29tbWVudDYyMjU4NzE3Nw==	9599	2020-05-01T22:07:51Z	2020-05-01T22:07:51Z	OWNER	This is my failed attempt to recreate the bug (plus some extra debugging output): ```diff % git diff diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py index dd49d5c..ea42aea 100644 --- a/sqlite_utils/db.py +++ b/sqlite_utils/db.py @@ -1013,7 +1013,11 @@ class Table(Queryable): assert ( num_columns <= SQLITE_MAX_VARS ), "Rows can have a maximum of {} columns".format(SQLITE_MAX_VARS) + print("default batch_size = ", batch_size) batch_size = max(1, min(batch_size, SQLITE_MAX_VARS // num_columns)) + print("new batch_size = {},num_columns = {}, MAX_VARS // num_columns = {}".format( + batch_size, num_columns, SQLITE_MAX_VARS // num_columns + )) self.last_rowid = None self.last_pk = None for chunk in chunks(itertools.chain([first_record], records), batch_size): @@ -1124,6 +1128,9 @@ class Table(Queryable): ) flat_values = list(itertools.chain(*values)) queries_and_params = [(sql, flat_values)] + print(sql.count("?"), len(flat_values)) + + # print(json.dumps(queries_and_params, indent=4)) with self.db.conn: for query, params in queries_and_params: diff --git a/tests/test_create.py b/tests/test_create.py index 5290cd8..52940df 100644 --- a/tests/test_create.py +++ b/tests/test_create.py @@ -853,3 +853,33 @@ def test_create_with_nested_bytes(fresh_db): record = {"id": 1, "data": {"foo": b"bytes"}} fresh_db["t"].insert(record) assert [{"id": 1, "data": '{"foo": "b\'bytes\'"}'}] == list(fresh_db["t"].rows) + + +def test_create_throws_useful_error_with_increasing_number_of_columns(fresh_db): + # https://github.com/simonw/sqlite-utils/issues/103 + def rows(): + yield {"name": 0} + for i in range(1, 1001): + yield { + "name": i, + "age": i, + "size": i, + "name2": i, …	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	610517472
https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622584433	https://api.github.com/repos/simonw/sqlite-utils/issues/103	622584433	MDEyOklzc3VlQ29tbWVudDYyMjU4NDQzMw==	9599	2020-05-01T21:57:52Z	2020-05-01T21:57:52Z	OWNER	@b0b5h4rp13 I'm having trouble creating a test that triggers this bug. Could you share a chunk of code that replicates what you're seeing here?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	610517472
https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622565276	https://api.github.com/repos/simonw/sqlite-utils/issues/103	622565276	MDEyOklzc3VlQ29tbWVudDYyMjU2NTI3Ng==	9599	2020-05-01T20:57:16Z	2020-05-01T20:57:16Z	OWNER	I'm reconsidering this: I think this is going to happen ANY time someone has at least one row that is wider than the first row. So at the very least I should show a more understandable error message.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	610517472
https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622563188	https://api.github.com/repos/simonw/sqlite-utils/issues/103	622563188	MDEyOklzc3VlQ29tbWVudDYyMjU2MzE4OA==	9599	2020-05-01T20:51:24Z	2020-05-01T20:51:29Z	OWNER	Hopefully anyone who runs into this problem in the future will search for and find this issue thread!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	610517472
https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622563059	https://api.github.com/repos/simonw/sqlite-utils/issues/103	622563059	MDEyOklzc3VlQ29tbWVudDYyMjU2MzA1OQ==	9599	2020-05-01T20:51:01Z	2020-05-01T20:51:01Z	OWNER	I'm not sure what to do about this. I was thinking the solution would be to look at ALL of the rows in a batch before deciding on the maximum number of columns, but that doesn't work because we calculate batch size based on the number of columns! I think my recommendation here is to manually pass a `batch_size=` argument to `.insert_all()` if you run into this error.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	610517472
https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622561944	https://api.github.com/repos/simonw/sqlite-utils/issues/103	622561944	MDEyOklzc3VlQ29tbWVudDYyMjU2MTk0NA==	9599	2020-05-01T20:47:51Z	2020-05-01T20:47:51Z	OWNER	Yup we only take the number of columns in the first record into account at the moment: https://github.com/simonw/sqlite-utils/blob/d56029549acae0b0ea94c5a0f783e3b3895d9218/sqlite_utils/db.py#L1007-L1016	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	610517472
https://github.com/simonw/sqlite-utils/issues/103#issuecomment-622561585	https://api.github.com/repos/simonw/sqlite-utils/issues/103	622561585	MDEyOklzc3VlQ29tbWVudDYyMjU2MTU4NQ==	9599	2020-05-01T20:46:50Z	2020-05-01T20:46:50Z	OWNER	The varying number of columns thing is interesting - I don't think the tests cover that case much if at all.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	610517472
https://github.com/simonw/sqlite-utils/issues/105#issuecomment-622558889	https://api.github.com/repos/simonw/sqlite-utils/issues/105	622558889	MDEyOklzc3VlQ29tbWVudDYyMjU1ODg4OQ==	9599	2020-05-01T20:40:06Z	2020-05-01T20:40:06Z	OWNER	Documentation: https://sqlite-utils.readthedocs.io/en/latest/cli.html#listing-views	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	610853576
https://github.com/simonw/datasette/issues/749#issuecomment-622450636	https://api.github.com/repos/simonw/datasette/issues/749	622450636	MDEyOklzc3VlQ29tbWVudDYyMjQ1MDYzNg==	9599	2020-05-01T16:08:46Z	2020-05-01T16:08:46Z	OWNER	Proposed solution: on Cloud Run don't show the "download database" link if the database file is larger than 32MB. I can do this with a new config setting, `max_db_mb`, which is automatically set by the `publish cloudrun` command. This is consistent with the existing `max_csv_mb` setting: https://datasette.readthedocs.io/en/stable/config.html#max-csv-mb I should set `max_csv_mb` to 32MB on Cloud Run deploys as well.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	610829227