id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,pull_request,body,repo,type,active_lock_reason,performed_via_github_app,reactions,draft,state_reason
807817197,MDU6SXNzdWU4MDc4MTcxOTc=,229,Hitting `_csv.Error: field larger than field limit (131072)`,631242,closed,0,,,3,2021-02-13T19:52:44Z,2021-02-14T21:33:33Z,2021-02-14T21:33:33Z,NONE,,"I have a csv file where one of the fields is so large it is throwing an exception with this error and stops loading:
	```
	_csv.Error: field larger than field limit (131072)
	```

The stack trace occurs here: https://github.com/simonw/sqlite-utils/blob/3.1/sqlite_utils/cli.py#L633


There is a way to handle this that helps:
https://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072

One issue I had with this problem was sqlite-utils only provides limited context as to where the problem line is.
There is the progress bar, but that is by percent rather than by line number.  It would have been helpful if it could have provided a line number.

Also, it would have been useful if it had allowed the loading to continue with later lines.
",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/229/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
808008305,MDU6SXNzdWU4MDgwMDgzMDU=,230,--sniff option for sniffing delimiters,9599,closed,0,,,8,2021-02-14T17:43:54Z,2021-02-14T21:15:33Z,2021-02-14T19:24:32Z,OWNER,,"> I just spotted that `csv.Sniffer` in the Python standard library has a `.has_header(sample)` method which detects if the first row appears to be a header or not, which is interesting. https://docs.python.org/3/library/csv.html#csv.Sniffer

_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778812050_",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/230/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
797159961,MDExOlB1bGxSZXF1ZXN0NTY0MjE1MDEx,225,fix for problem in Table.insert_all on search for columns per chunk of rows,261237,closed,0,,,2,2021-01-29T20:16:07Z,2021-02-14T21:04:13Z,2021-02-14T21:04:13Z,NONE,simonw/sqlite-utils/pulls/225,"Hi,

I ran into a problem when trying to create a database from my Apple Healthkit data using [healthkit-to-sqlite](https://github.com/dogsheep/healthkit-to-sqlite). The program crashed because of an invalid insert statement that was generated for table `rDistanceCycling`. 

The actual problem turned out to be in [sqlite-utils](https://github.com/simonw/sqlite-utils). `Table.insert_all` processes the data to be inserted in chunks of rows and checks for every chunk which columns are used, and it will collect all column names in the variable `all_columns`.  The collection of columns is done using a nested list comprehension that is not completely correct. 

I'm using a Windows machine and had to make a few adjustments to the tests in order to be able to run them because they had a posix dependency.

Thanks, kind regards,

Frans

```
# this is a (condensed) chunk of data from my Apple healthkit export that caused the problem.
# the 3 last items in the chunk have additional keys: metadata_HKMetadataKeySyncVersion and metadata_HKMetadataKeySyncIdentifier

chunk = [{'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '7.0.1',
          'device': '<<HKDevice: 0x281cf6c70>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:7.0.1>',
          'unit': 'km', 'creationDate': '2020-10-10 12:29:09 +0100', 'startDate': '2020-10-10 12:29:06 +0100',
          'endDate': '2020-10-10 12:29:07 +0100', 'value': '0.00518016'},
         {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '7.0.1',
          'device': '<<HKDevice: 0x281cf6c70>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:7.0.1>',
          'unit': 'km', 'creationDate': '2020-10-10 12:29:10 +0100', 'startDate': '2020-10-10 12:29:07 +0100',
          'endDate': '2020-10-10 12:29:08 +0100', 'value': '0.00544049'},
         {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '6.2.6',
          'device': '<<HKDevice: 0x281cf83e0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',
          'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:40:50 +0100',
          'endDate': '2020-07-15 16:42:49 +0100', 'value': '0.952092', 'metadata_HKMetadataKeySyncVersion': '1',
          'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520450.99823:616520569.99360:119'},
         {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '6.2.6',
          'device': '<<HKDevice: 0x281cf83e0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',
          'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:42:49 +0100',
          'endDate': '2020-07-15 16:44:51 +0100', 'value': '0.848983', 'metadata_HKMetadataKeySyncVersion': '1',
          'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520569.99360:616520691.98826:119'},
         {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '6.2.6',
          'device': '<<HKDevice: 0x281cf83e0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',
          'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:44:51 +0100',
          'endDate': '2020-07-15 16:46:50 +0100', 'value': '0.834403', 'metadata_HKMetadataKeySyncVersion': '1',
          'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520691.98826:616520810.98305:119'}]



def all_columns_old():
    all_columns = [col for col in chunk[0]]
    all_columns += [column for record in chunk
                           for column in record if column not in all_columns]
    return all_columns


def all_columns_new():
    all_columns = [col for col in chunk[0]]
    for record in chunk:
        all_columns += [column for column in record if column not in all_columns]
    return all_columns



if __name__ == '__main__':
    from pprint import pprint

    print('problem: ')
    pprint(all_columns_old())
    print('\nfix: ')
    pprint(all_columns_new())

```
",140912432,pull,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/225/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
808046597,MDU6SXNzdWU4MDgwNDY1OTc=,234,.insert_all() fails if subsequent chunks contain additional columns,9599,closed,0,,,1,2021-02-14T21:01:51Z,2021-02-14T21:03:40Z,2021-02-14T21:03:40Z,OWNER,,Reported by @nieuwenhoven in #225 along with a proposed fix.,140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/234/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
808036774,MDU6SXNzdWU4MDgwMzY3NzQ=,232,Run tests against Windows in GitHub Actions,9599,closed,0,,,0,2021-02-14T20:09:45Z,2021-02-14T20:39:55Z,2021-02-14T20:39:55Z,OWNER,,"> I'm going to try and get the test suite to run in Windows on GitHub Actions.

_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/225#issuecomment-778834504_",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/232/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
808037010,MDExOlB1bGxSZXF1ZXN0NTczMTQ3MTY4,233,"Run tests against Ubuntu, macOS and Windows",9599,closed,0,,,0,2021-02-14T20:11:02Z,2021-02-14T20:39:54Z,2021-02-14T20:39:54Z,OWNER,simonw/sqlite-utils/pulls/233,Refs #232,140912432,pull,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/233/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
808028757,MDU6SXNzdWU4MDgwMjg3NTc=,231,"limit=X, offset=Y parameters for more Python methods",9599,closed,0,,,2,2021-02-14T19:31:23Z,2021-02-14T20:03:08Z,2021-02-14T20:03:08Z,OWNER,,"> I'm going to add a `offset=` parameter to support this case. Thanks for the suggestion!

_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/224#issuecomment-778828495_",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/231/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
792297010,MDExOlB1bGxSZXF1ZXN0NTYwMjA0MzA2,224,Add fts offset docs.,37962604,closed,0,,,2,2021-01-22T20:50:58Z,2021-02-14T19:31:06Z,2021-02-14T19:31:06Z,NONE,simonw/sqlite-utils/pulls/224,"The limit can be passed as a string to the query builder to have an offset. I have tested it using the shorthand `limit=f""15, 30""`, the standard syntax should work too.",140912432,pull,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/224/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,