{"id": 797159961, "node_id": "MDExOlB1bGxSZXF1ZXN0NTY0MjE1MDEx", "number": 225, "title": "fix for problem in Table.insert_all on search for columns per chunk of rows", "user": {"value": 261237, "label": "nieuwenhoven"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2021-01-29T20:16:07Z", "updated_at": "2021-02-14T21:04:13Z", "closed_at": "2021-02-14T21:04:13Z", "author_association": "NONE", "pull_request": "simonw/sqlite-utils/pulls/225", "body": "Hi,\r\n\r\nI ran into a problem when trying to create a database from my Apple Healthkit data using [healthkit-to-sqlite](https://github.com/dogsheep/healthkit-to-sqlite). The program crashed because of an invalid insert statement that was generated for table `rDistanceCycling`. \r\n\r\nThe actual problem turned out to be in [sqlite-utils](https://github.com/simonw/sqlite-utils). `Table.insert_all` processes the data to be inserted in chunks of rows and checks for every chunk which columns are used, and it will collect all column names in the variable `all_columns`. The collection of columns is done using a nested list comprehension that is not completely correct. \r\n\r\nI'm using a Windows machine and had to make a few adjustments to the tests in order to be able to run them because they had a posix dependency.\r\n\r\nThanks, kind regards,\r\n\r\nFrans\r\n\r\n```\r\n# this is a (condensed) chunk of data from my Apple healthkit export that caused the problem.\r\n# the 3 last items in the chunk have additional keys: metadata_HKMetadataKeySyncVersion and metadata_HKMetadataKeySyncIdentifier\r\n\r\nchunk = [{'sourceName': 'Apple\u00c2\\xa0Watch van Frans', 'sourceVersion': '7.0.1',\r\n 'device': '<, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:7.0.1>',\r\n 'unit': 'km', 'creationDate': '2020-10-10 12:29:09 +0100', 'startDate': '2020-10-10 12:29:06 +0100',\r\n 'endDate': '2020-10-10 12:29:07 +0100', 'value': '0.00518016'},\r\n {'sourceName': 'Apple\u00c2\\xa0Watch van Frans', 'sourceVersion': '7.0.1',\r\n 'device': '<, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:7.0.1>',\r\n 'unit': 'km', 'creationDate': '2020-10-10 12:29:10 +0100', 'startDate': '2020-10-10 12:29:07 +0100',\r\n 'endDate': '2020-10-10 12:29:08 +0100', 'value': '0.00544049'},\r\n {'sourceName': 'Apple\u00c2\\xa0Watch van Frans', 'sourceVersion': '6.2.6',\r\n 'device': '<, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',\r\n 'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:40:50 +0100',\r\n 'endDate': '2020-07-15 16:42:49 +0100', 'value': '0.952092', 'metadata_HKMetadataKeySyncVersion': '1',\r\n 'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520450.99823:616520569.99360:119'},\r\n {'sourceName': 'Apple\u00c2\\xa0Watch van Frans', 'sourceVersion': '6.2.6',\r\n 'device': '<, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',\r\n 'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:42:49 +0100',\r\n 'endDate': '2020-07-15 16:44:51 +0100', 'value': '0.848983', 'metadata_HKMetadataKeySyncVersion': '1',\r\n 'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520569.99360:616520691.98826:119'},\r\n {'sourceName': 'Apple\u00c2\\xa0Watch van Frans', 'sourceVersion': '6.2.6',\r\n 'device': '<, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',\r\n 'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:44:51 +0100',\r\n 'endDate': '2020-07-15 16:46:50 +0100', 'value': '0.834403', 'metadata_HKMetadataKeySyncVersion': '1',\r\n 'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520691.98826:616520810.98305:119'}]\r\n\r\n\r\n\r\ndef all_columns_old():\r\n all_columns = [col for col in chunk[0]]\r\n all_columns += [column for record in chunk\r\n for column in record if column not in all_columns]\r\n return all_columns\r\n\r\n\r\ndef all_columns_new():\r\n all_columns = [col for col in chunk[0]]\r\n for record in chunk:\r\n all_columns += [column for column in record if column not in all_columns]\r\n return all_columns\r\n\r\n\r\n\r\nif __name__ == '__main__':\r\n from pprint import pprint\r\n\r\n print('problem: ')\r\n pprint(all_columns_old())\r\n print('\\nfix: ')\r\n pprint(all_columns_new())\r\n\r\n```\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/225/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null}