html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app
https://github.com/simonw/sqlite-utils/issues/444#issuecomment-1155815956,https://api.github.com/repos/simonw/sqlite-utils/issues/444,1155815956,IC_kwDOCGYnMM5E5FoU,9599,2022-06-14T23:49:56Z,2022-07-07T16:39:18Z,OWNER,"Yeah my initial implementation there makes no sense:
```python
csv_reader_args = {""dialect"": dialect}
if delimiter:
csv_reader_args[""delimiter""] = delimiter
if quotechar:
csv_reader_args[""quotechar""] = quotechar
reader = _extra_key_strategy(
csv_std.reader(decoded, **csv_reader_args), ignore_extras, extras_key
)
first_row = next(reader)
if no_headers:
headers = [""untitled_{}"".format(i + 1) for i in range(len(first_row))]
reader = itertools.chain([first_row], reader)
else:
headers = first_row
docs = (dict(zip(headers, row)) for row in reader)
```
Because my `_extra_key_strategy()` helper function is designed to work against `csv.DictReader` - not against `csv.reader()` which returns a sequence of lists, not a sequence of dictionaries.
In fact, what's happening here is that `dict(zip(headers, row))` is ignoring anything in the row that doesn't correspond to a header:
```pycon
>>> list(zip([""a"", ""b""], [1, 2, 3]))
[('a', 1), ('b', 2)]
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1271426387,
https://github.com/simonw/sqlite-utils/issues/444#issuecomment-1155815186,https://api.github.com/repos/simonw/sqlite-utils/issues/444,1155815186,IC_kwDOCGYnMM5E5FcS,9599,2022-06-14T23:48:16Z,2022-06-14T23:48:16Z,OWNER,"This is tricky to implement because of this code: https://github.com/simonw/sqlite-utils/blob/b8af3b96f5c72317cc8783dc296a94f6719987d9/sqlite_utils/cli.py#L938-L945
It's reconstructing each document using the known headers here:
`docs = (dict(zip(headers, row)) for row in reader)`
So my first attempt at this - the diff here - did not have the desired result:
```diff
diff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py
index 86eddfb..00b920b 100644
--- a/sqlite_utils/cli.py
+++ b/sqlite_utils/cli.py
@@ -6,7 +6,7 @@ import hashlib
import pathlib
import sqlite_utils
from sqlite_utils.db import AlterError, BadMultiValues, DescIndex
-from sqlite_utils.utils import maximize_csv_field_size_limit
+from sqlite_utils.utils import maximize_csv_field_size_limit, _extra_key_strategy
from sqlite_utils import recipes
import textwrap
import inspect
@@ -797,6 +797,15 @@ _import_options = (
""--encoding"",
help=""Character encoding for input, defaults to utf-8"",
),
+ click.option(
+ ""--ignore-extras"",
+ is_flag=True,
+ help=""If a CSV line has more than the expected number of values, ignore the extras"",
+ ),
+ click.option(
+ ""--extras-key"",
+ help=""If a CSV line has more than the expected number of values put them in a list in this column"",
+ ),
)
@@ -885,6 +894,8 @@ def insert_upsert_implementation(
sniff,
no_headers,
encoding,
+ ignore_extras,
+ extras_key,
batch_size,
alter,
upsert,
@@ -909,6 +920,10 @@ def insert_upsert_implementation(
raise click.ClickException(""--flatten cannot be used with --csv or --tsv"")
if encoding and not (csv or tsv):
raise click.ClickException(""--encoding must be used with --csv or --tsv"")
+ if ignore_extras and extras_key:
+ raise click.ClickException(
+ ""--ignore-extras and --extras-key cannot be used together""
+ )
if pk and len(pk) == 1:
pk = pk[0]
encoding = encoding or ""utf-8-sig""
@@ -935,7 +950,9 @@ def insert_upsert_implementation(
csv_reader_args[""delimiter""] = delimiter
if quotechar:
csv_reader_args[""quotechar""] = quotechar
- reader = csv_std.reader(decoded, **csv_reader_args)
+ reader = _extra_key_strategy(
+ csv_std.reader(decoded, **csv_reader_args), ignore_extras, extras_key
+ )
first_row = next(reader)
if no_headers:
headers = [""untitled_{}"".format(i + 1) for i in range(len(first_row))]
@@ -1101,6 +1118,8 @@ def insert(
sniff,
no_headers,
encoding,
+ ignore_extras,
+ extras_key,
batch_size,
alter,
detect_types,
@@ -1176,6 +1195,8 @@ def insert(
sniff,
no_headers,
encoding,
+ ignore_extras,
+ extras_key,
batch_size,
alter=alter,
upsert=False,
@@ -1214,6 +1235,8 @@ def upsert(
sniff,
no_headers,
encoding,
+ ignore_extras,
+ extras_key,
alter,
not_null,
default,
@@ -1254,6 +1277,8 @@ def upsert(
sniff,
no_headers,
encoding,
+ ignore_extras,
+ extras_key,
batch_size,
alter=alter,
upsert=True,
@@ -1297,6 +1322,8 @@ def bulk(
sniff,
no_headers,
encoding,
+ ignore_extras,
+ extras_key,
load_extension,
):
""""""
@@ -1331,6 +1358,8 @@ def bulk(
sniff=sniff,
no_headers=no_headers,
encoding=encoding,
+ ignore_extras=ignore_extras,
+ extras_key=extras_key,
batch_size=batch_size,
alter=False,
upsert=False,
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1271426387,
https://github.com/simonw/sqlite-utils/issues/444#issuecomment-1155804591,https://api.github.com/repos/simonw/sqlite-utils/issues/444,1155804591,IC_kwDOCGYnMM5E5C2v,9599,2022-06-14T23:28:36Z,2022-06-14T23:28:36Z,OWNER,I'm going with `--extras-key` and `--ignore-extras` as the two new options.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1271426387,
https://github.com/simonw/sqlite-utils/issues/444#issuecomment-1155804459,https://api.github.com/repos/simonw/sqlite-utils/issues/444,1155804459,IC_kwDOCGYnMM5E5C0r,9599,2022-06-14T23:28:18Z,2022-06-14T23:28:18Z,OWNER,"I think these become part of the `_import_options` list which is used in a few places:
https://github.com/simonw/sqlite-utils/blob/b8af3b96f5c72317cc8783dc296a94f6719987d9/sqlite_utils/cli.py#L765-L800","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1271426387,