issues: 797159961

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app
797159961 MDExOlB1bGxSZXF1ZXN0NTY0MjE1MDEx 225 fix for problem in Table.insert_all on search for columns per chunk of rows 261237 closed 0     2 2021-01-29T20:16:07Z 2021-02-14T21:04:13Z 2021-02-14T21:04:13Z NONE simonw/sqlite-utils/pulls/225

Hi,

I ran into a problem when trying to create a database from my Apple Healthkit data using healthkit-to-sqlite. The program crashed because of an invalid insert statement that was generated for table rDistanceCycling.

The actual problem turned out to be in sqlite-utils. Table.insert_all processes the data to be inserted in chunks of rows and checks for every chunk which columns are used, and it will collect all column names in the variable all_columns. The collection of columns is done using a nested list comprehension that is not completely correct.

I'm using a Windows machine and had to make a few adjustments to the tests in order to be able to run them because they had a posix dependency.

Thanks, kind regards,

Frans

# this is a (condensed) chunk of data from my Apple healthkit export that caused the problem.
# the 3 last items in the chunk have additional keys: metadata_HKMetadataKeySyncVersion and metadata_HKMetadataKeySyncIdentifier

chunk = [{'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '7.0.1',
          'device': '<<HKDevice: 0x281cf6c70>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:7.0.1>',
          'unit': 'km', 'creationDate': '2020-10-10 12:29:09 +0100', 'startDate': '2020-10-10 12:29:06 +0100',
          'endDate': '2020-10-10 12:29:07 +0100', 'value': '0.00518016'},
         {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '7.0.1',
          'device': '<<HKDevice: 0x281cf6c70>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:7.0.1>',
          'unit': 'km', 'creationDate': '2020-10-10 12:29:10 +0100', 'startDate': '2020-10-10 12:29:07 +0100',
          'endDate': '2020-10-10 12:29:08 +0100', 'value': '0.00544049'},
         {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '6.2.6',
          'device': '<<HKDevice: 0x281cf83e0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',
          'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:40:50 +0100',
          'endDate': '2020-07-15 16:42:49 +0100', 'value': '0.952092', 'metadata_HKMetadataKeySyncVersion': '1',
          'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520450.99823:616520569.99360:119'},
         {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '6.2.6',
          'device': '<<HKDevice: 0x281cf83e0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',
          'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:42:49 +0100',
          'endDate': '2020-07-15 16:44:51 +0100', 'value': '0.848983', 'metadata_HKMetadataKeySyncVersion': '1',
          'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520569.99360:616520691.98826:119'},
         {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '6.2.6',
          'device': '<<HKDevice: 0x281cf83e0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',
          'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:44:51 +0100',
          'endDate': '2020-07-15 16:46:50 +0100', 'value': '0.834403', 'metadata_HKMetadataKeySyncVersion': '1',
          'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520691.98826:616520810.98305:119'}]



def all_columns_old():
    all_columns = [col for col in chunk[0]]
    all_columns += [column for record in chunk
                           for column in record if column not in all_columns]
    return all_columns


def all_columns_new():
    all_columns = [col for col in chunk[0]]
    for record in chunk:
        all_columns += [column for column in record if column not in all_columns]
    return all_columns



if __name__ == '__main__':
    from pprint import pprint

    print('problem: ')
    pprint(all_columns_old())
    print('\nfix: ')
    pprint(all_columns_new())
140912432 pull    

Links from other tables