id,node_id,number,title,user,user_label,state,locked,assignee,assignee_label,milestone,milestone_label,comments,created_at,updated_at,closed_at,author_association,pull_request,body,repo,repo_label,type,active_lock_reason,performed_via_github_app,reactions,draft,state_reason
807437089,MDU6SXNzdWU4MDc0MzcwODk=,228,--no-headers option for CSV and TSV,9599,simonw,closed,0,,,,,10,2021-02-12T17:56:51Z,2021-12-26T07:01:31Z,2021-02-14T22:25:17Z,OWNER,,"https://bl.iro.bl.uk/work/ns/3037474a-761c-456d-a00c-9ef3c6773f4c has a fascinating CSV file that doesn't have a header row - it starts like this:

```csv
Computation and measurement of turbulent flow through idealized turbine blade passages,,""Loizou, Panos A."",https://isni.org/isni/0000000136122593,,University of Manchester,https://isni.org/isni/0000000121662407,1989,Thesis (Ph.D.),,Physical Sciences,,,https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.232781,
""Prolactin and growth hormone secretion in normal, hyperprolactinaemic and acromegalic man"",,""Prescott, R. W. G."",https://isni.org/isni/0000000134992122,,University of Newcastle upon Tyne,https://isni.org/isni/0000000104627212,1983,Thesis (Ph.D.),,Biological Sciences,,,https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.232784,
```

It would be useful if `sqlite-utils insert ... --csv` had a mechanism for importing files like this one.",140912432,sqlite-utils,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/228/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
783778672,MDU6SXNzdWU3ODM3Nzg2NzI=,220,Better error message for *_fts methods against views,649467,mhalle,closed,0,,,,,3,2021-01-11T23:24:00Z,2021-02-22T20:44:51Z,2021-02-14T22:34:26Z,NONE,,"enable_fts and its related methods only work on tables, not views. 

Could those methods and possibly others move up to the Queryable superclass?
",140912432,sqlite-utils,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/220/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
807174161,MDU6SXNzdWU4MDcxNzQxNjE=,227,Error reading csv files with large column data,295329,camallen,closed,0,,,,,4,2021-02-12T11:51:47Z,2021-02-16T11:48:03Z,2021-02-14T21:17:19Z,NONE,,"*Feel free to close this issue - I mostly added it for reference for future folks that run into this :)*

I have a CSV file with one column that has very long strings.  When i try to import this file via the `insert` command I get the following error: 
```
sqlite-utils insert database.db table_name file_with_large_column.csv

Traceback (most recent call last):
  File ""/usr/local/bin/sqlite-utils"", line 10, in <module>
    sys.exit(cli())
  File ""/usr/local/lib/python3.7/site-packages/click/core.py"", line 829, in __call__
    return self.main(*args, **kwargs)
  File ""/usr/local/lib/python3.7/site-packages/click/core.py"", line 782, in main
    rv = self.invoke(ctx)
  File ""/usr/local/lib/python3.7/site-packages/click/core.py"", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File ""/usr/local/lib/python3.7/site-packages/click/core.py"", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File ""/usr/local/lib/python3.7/site-packages/click/core.py"", line 610, in invoke
    return callback(*args, **kwargs)
  File ""/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py"", line 774, in insert
    default=default,
  File ""/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py"", line 705, in insert_upsert_implementation
    docs, pk=pk, batch_size=batch_size, alter=alter, **extra_kwargs
  File ""/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py"", line 1852, in insert_all
    first_record = next(records)
  File ""/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py"", line 703, in <genexpr>
    docs = (decode_base64_values(doc) for doc in docs)
  File ""/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py"", line 681, in <genexpr>
    docs = (dict(zip(headers, row)) for row in reader)
_csv.Error: field larger than field limit (131072)
```
Built with the docker image `datasetteproject/datasette:0.54` with the following versions:
```
# sqlite-utils --version
sqlite-utils, version 3.4.1

# datasette --version
datasette, version 0.54
```
It appears this is a [known issue](https://stackoverflow.com/a/54517228/2761423) reading in csv files in python and [doesn't look to be modifiable](https://github.com/python/cpython/blob/ea46579067fd2d4e164d6605719ffec690c4d621/Modules/_csv.c#L1685) through system / env vars (i may be very wrong on this).

Noting that using sqlite3 `import` command work without error (not using the python csv reader)
```
sqlite3 database.db
sqlite> .mode csv
sqlite> .import file_with_large_column.csv table_name
```
Sadly I couldn't see an easy way around this while using the cli as it appears this value needs to be changed in python code. FWIW I've switched to using https://datasette.io/tools/csvs-to-sqlite for importing csv data and it's working well. 

Finally, I'm loving https://datasette.io/ thank you very much for an amazing tool and data ecosytem 🙇‍♀️ ",140912432,sqlite-utils,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/227/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
807817197,MDU6SXNzdWU4MDc4MTcxOTc=,229,Hitting `_csv.Error: field larger than field limit (131072)`,631242,frosencrantz,closed,0,,,,,3,2021-02-13T19:52:44Z,2021-02-14T21:33:33Z,2021-02-14T21:33:33Z,NONE,,"I have a csv file where one of the fields is so large it is throwing an exception with this error and stops loading:
	```
	_csv.Error: field larger than field limit (131072)
	```

The stack trace occurs here: https://github.com/simonw/sqlite-utils/blob/3.1/sqlite_utils/cli.py#L633


There is a way to handle this that helps:
https://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072

One issue I had with this problem was sqlite-utils only provides limited context as to where the problem line is.
There is the progress bar, but that is by percent rather than by line number.  It would have been helpful if it could have provided a line number.

Also, it would have been useful if it had allowed the loading to continue with later lines.
",140912432,sqlite-utils,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/229/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
808008305,MDU6SXNzdWU4MDgwMDgzMDU=,230,--sniff option for sniffing delimiters,9599,simonw,closed,0,,,,,8,2021-02-14T17:43:54Z,2021-02-14T21:15:33Z,2021-02-14T19:24:32Z,OWNER,,"> I just spotted that `csv.Sniffer` in the Python standard library has a `.has_header(sample)` method which detects if the first row appears to be a header or not, which is interesting. https://docs.python.org/3/library/csv.html#csv.Sniffer

_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778812050_",140912432,sqlite-utils,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/230/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
797159961,MDExOlB1bGxSZXF1ZXN0NTY0MjE1MDEx,225,fix for problem in Table.insert_all on search for columns per chunk of rows,261237,nieuwenhoven,closed,0,,,,,2,2021-01-29T20:16:07Z,2021-02-14T21:04:13Z,2021-02-14T21:04:13Z,NONE,simonw/sqlite-utils/pulls/225,"Hi,

I ran into a problem when trying to create a database from my Apple Healthkit data using [healthkit-to-sqlite](https://github.com/dogsheep/healthkit-to-sqlite). The program crashed because of an invalid insert statement that was generated for table `rDistanceCycling`. 

The actual problem turned out to be in [sqlite-utils](https://github.com/simonw/sqlite-utils). `Table.insert_all` processes the data to be inserted in chunks of rows and checks for every chunk which columns are used, and it will collect all column names in the variable `all_columns`.  The collection of columns is done using a nested list comprehension that is not completely correct. 

I'm using a Windows machine and had to make a few adjustments to the tests in order to be able to run them because they had a posix dependency.

Thanks, kind regards,

Frans

```
# this is a (condensed) chunk of data from my Apple healthkit export that caused the problem.
# the 3 last items in the chunk have additional keys: metadata_HKMetadataKeySyncVersion and metadata_HKMetadataKeySyncIdentifier

chunk = [{'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '7.0.1',
          'device': '<<HKDevice: 0x281cf6c70>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:7.0.1>',
          'unit': 'km', 'creationDate': '2020-10-10 12:29:09 +0100', 'startDate': '2020-10-10 12:29:06 +0100',
          'endDate': '2020-10-10 12:29:07 +0100', 'value': '0.00518016'},
         {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '7.0.1',
          'device': '<<HKDevice: 0x281cf6c70>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:7.0.1>',
          'unit': 'km', 'creationDate': '2020-10-10 12:29:10 +0100', 'startDate': '2020-10-10 12:29:07 +0100',
          'endDate': '2020-10-10 12:29:08 +0100', 'value': '0.00544049'},
         {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '6.2.6',
          'device': '<<HKDevice: 0x281cf83e0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',
          'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:40:50 +0100',
          'endDate': '2020-07-15 16:42:49 +0100', 'value': '0.952092', 'metadata_HKMetadataKeySyncVersion': '1',
          'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520450.99823:616520569.99360:119'},
         {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '6.2.6',
          'device': '<<HKDevice: 0x281cf83e0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',
          'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:42:49 +0100',
          'endDate': '2020-07-15 16:44:51 +0100', 'value': '0.848983', 'metadata_HKMetadataKeySyncVersion': '1',
          'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520569.99360:616520691.98826:119'},
         {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '6.2.6',
          'device': '<<HKDevice: 0x281cf83e0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',
          'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:44:51 +0100',
          'endDate': '2020-07-15 16:46:50 +0100', 'value': '0.834403', 'metadata_HKMetadataKeySyncVersion': '1',
          'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520691.98826:616520810.98305:119'}]



def all_columns_old():
    all_columns = [col for col in chunk[0]]
    all_columns += [column for record in chunk
                           for column in record if column not in all_columns]
    return all_columns


def all_columns_new():
    all_columns = [col for col in chunk[0]]
    for record in chunk:
        all_columns += [column for column in record if column not in all_columns]
    return all_columns



if __name__ == '__main__':
    from pprint import pprint

    print('problem: ')
    pprint(all_columns_old())
    print('\nfix: ')
    pprint(all_columns_new())

```
",140912432,sqlite-utils,pull,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/225/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
808046597,MDU6SXNzdWU4MDgwNDY1OTc=,234,.insert_all() fails if subsequent chunks contain additional columns,9599,simonw,closed,0,,,,,1,2021-02-14T21:01:51Z,2021-02-14T21:03:40Z,2021-02-14T21:03:40Z,OWNER,,Reported by @nieuwenhoven in #225 along with a proposed fix.,140912432,sqlite-utils,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/234/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
808036774,MDU6SXNzdWU4MDgwMzY3NzQ=,232,Run tests against Windows in GitHub Actions,9599,simonw,closed,0,,,,,0,2021-02-14T20:09:45Z,2021-02-14T20:39:55Z,2021-02-14T20:39:55Z,OWNER,,"> I'm going to try and get the test suite to run in Windows on GitHub Actions.

_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/225#issuecomment-778834504_",140912432,sqlite-utils,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/232/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
808037010,MDExOlB1bGxSZXF1ZXN0NTczMTQ3MTY4,233,"Run tests against Ubuntu, macOS and Windows",9599,simonw,closed,0,,,,,0,2021-02-14T20:11:02Z,2021-02-14T20:39:54Z,2021-02-14T20:39:54Z,OWNER,simonw/sqlite-utils/pulls/233,Refs #232,140912432,sqlite-utils,pull,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/233/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
808028757,MDU6SXNzdWU4MDgwMjg3NTc=,231,"limit=X, offset=Y parameters for more Python methods",9599,simonw,closed,0,,,,,2,2021-02-14T19:31:23Z,2021-02-14T20:03:08Z,2021-02-14T20:03:08Z,OWNER,,"> I'm going to add a `offset=` parameter to support this case. Thanks for the suggestion!

_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/224#issuecomment-778828495_",140912432,sqlite-utils,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/231/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed
792297010,MDExOlB1bGxSZXF1ZXN0NTYwMjA0MzA2,224,Add fts offset docs.,37962604,polyrand,closed,0,,,,,2,2021-01-22T20:50:58Z,2021-02-14T19:31:06Z,2021-02-14T19:31:06Z,NONE,simonw/sqlite-utils/pulls/224,"The limit can be passed as a string to the query builder to have an offset. I have tested it using the shorthand `limit=f""15, 30""`, the standard syntax should work too.",140912432,sqlite-utils,pull,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/224/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,