issues: 797159961

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	pull_request	body	repo	type	active_lock_reason	performed_via_github_app	reactions	draft	state_reason
797159961	MDExOlB1bGxSZXF1ZXN0NTY0MjE1MDEx	225	fix for problem in Table.insert_all on search for columns per chunk of rows	261237	closed	0			2	2021-01-29T20:16:07Z	2021-02-14T21:04:13Z	2021-02-14T21:04:13Z	NONE	simonw/sqlite-utils/pulls/225	Hi, I ran into a problem when trying to create a database from my Apple Healthkit data using healthkit-to-sqlite. The program crashed because of an invalid insert statement that was generated for table `rDistanceCycling`. The actual problem turned out to be in sqlite-utils. `Table.insert_all` processes the data to be inserted in chunks of rows and checks for every chunk which columns are used, and it will collect all column names in the variable `all_columns`. The collection of columns is done using a nested list comprehension that is not completely correct. I'm using a Windows machine and had to make a few adjustments to the tests in order to be able to run them because they had a posix dependency. Thanks, kind regards, Frans ``` this is a (condensed) chunk of data from my Apple healthkit export that caused the problem. the 3 last items in the chunk have additional keys: metadata_HKMetadataKeySyncVersion and metadata_HKMetadataKeySyncIdentifier chunk = [{'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '7.0.1', 'device': '<<HKDevice: 0x281cf6c70>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:7.0.1>', 'unit': 'km', 'creationDate': '2020-10-10 12:29:09 +0100', 'startDate': '2020-10-10 12:29:06 +0100', 'endDate': '2020-10-10 12:29:07 +0100', 'value': '0.00518016'}, {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '7.0.1', 'device': '<<HKDevice: 0x281cf6c70>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:7.0.1>', 'unit': 'km', 'creationDate': '2020-10-10 12:29:10 +0100', 'startDate': '2020-10-10 12:29:07 +0100', 'endDate': '2020-10-10 12:29:08 +0100', 'value': '0.00544049'}, {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '6.2.6', 'device': '<<HKDevice: 0x281cf83e0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>', 'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:40:50 +0100', 'endDate': '2020-07-15 16:42:49 +0100', 'value': '0.952092', 'metadata_HKMetadataKeySyncVersion': '1', 'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520450.99823:616520569.99360:119'}, {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '6.2.6', 'device': '<<HKDevice: 0x281cf83e0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>', 'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:42:49 +0100', 'endDate': '2020-07-15 16:44:51 +0100', 'value': '0.848983', 'metadata_HKMetadataKeySyncVersion': '1', 'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520569.99360:616520691.98826:119'}, {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '6.2.6', 'device': '<<HKDevice: 0x281cf83e0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>', 'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:44:51 +0100', 'endDate': '2020-07-15 16:46:50 +0100', 'value': '0.834403', 'metadata_HKMetadataKeySyncVersion': '1', 'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520691.98826:616520810.98305:119'}] def all_columns_old(): all_columns = [col for col in chunk[0]] all_columns += [column for record in chunk for column in record if column not in all_columns] return all_columns def all_columns_new(): all_columns = [col for col in chunk[0]] for record in chunk: all_columns += [column for column in record if column not in all_columns] return all_columns if name == 'main': from pprint import pprint `print('problem: ') pprint(all_columns_old()) print('\nfix: ') pprint(all_columns_new())` ```	140912432	pull			{ "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/225/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	0

Links from other tables

0 rows from issues_id in issues_labels
2 rows from issue in issue_comments