{"html_url": "https://github.com/simonw/sqlite-utils/issues/444#issuecomment-1155815956", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/444", "id": 1155815956, "node_id": "IC_kwDOCGYnMM5E5FoU", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T23:49:56Z", "updated_at": "2022-07-07T16:39:18Z", "author_association": "OWNER", "body": "Yeah my initial implementation there makes no sense:\r\n```python\r\n            csv_reader_args = {\"dialect\": dialect}\r\n            if delimiter:\r\n                csv_reader_args[\"delimiter\"] = delimiter\r\n            if quotechar:\r\n                csv_reader_args[\"quotechar\"] = quotechar\r\n            reader = _extra_key_strategy(\r\n                csv_std.reader(decoded, **csv_reader_args), ignore_extras, extras_key\r\n            )\r\n            first_row = next(reader)\r\n            if no_headers:\r\n                headers = [\"untitled_{}\".format(i + 1) for i in range(len(first_row))]\r\n                reader = itertools.chain([first_row], reader)\r\n            else:\r\n                headers = first_row\r\n            docs = (dict(zip(headers, row)) for row in reader)\r\n```\r\nBecause my `_extra_key_strategy()` helper function is designed to work against `csv.DictReader` - not against `csv.reader()` which returns a sequence of lists, not a sequence of dictionaries.\r\n\r\nIn fact, what's happening here is that `dict(zip(headers, row))` is ignoring anything in the row that doesn't correspond to a header:\r\n\r\n```pycon\r\n>>> list(zip([\"a\", \"b\"], [1, 2, 3]))\r\n[('a', 1), ('b', 2)]\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1271426387, "label": "CSV `extras_key=` and `ignore_extras=` equivalents for CLI tool"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/444#issuecomment-1155966234", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/444", "id": 1155966234, "node_id": "IC_kwDOCGYnMM5E5qUa", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-15T04:18:05Z", "updated_at": "2022-06-15T04:18:05Z", "author_association": "OWNER", "body": "I'm going to push a branch with my not-yet-working code (which does at least include a test).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1271426387, "label": "CSV `extras_key=` and `ignore_extras=` equivalents for CLI tool"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/444#issuecomment-1155815186", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/444", "id": 1155815186, "node_id": "IC_kwDOCGYnMM5E5FcS", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T23:48:16Z", "updated_at": "2022-06-14T23:48:16Z", "author_association": "OWNER", "body": "This is tricky to implement because of this code: https://github.com/simonw/sqlite-utils/blob/b8af3b96f5c72317cc8783dc296a94f6719987d9/sqlite_utils/cli.py#L938-L945\r\n\r\nIt's reconstructing each document using the known headers here:\r\n\r\n`docs = (dict(zip(headers, row)) for row in reader)`\r\n\r\nSo my first attempt at this - the diff here - did not have the desired result:\r\n\r\n```diff\r\ndiff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py\r\nindex 86eddfb..00b920b 100644\r\n--- a/sqlite_utils/cli.py\r\n+++ b/sqlite_utils/cli.py\r\n@@ -6,7 +6,7 @@ import hashlib\r\n import pathlib\r\n import sqlite_utils\r\n from sqlite_utils.db import AlterError, BadMultiValues, DescIndex\r\n-from sqlite_utils.utils import maximize_csv_field_size_limit\r\n+from sqlite_utils.utils import maximize_csv_field_size_limit, _extra_key_strategy\r\n from sqlite_utils import recipes\r\n import textwrap\r\n import inspect\r\n@@ -797,6 +797,15 @@ _import_options = (\r\n         \"--encoding\",\r\n         help=\"Character encoding for input, defaults to utf-8\",\r\n     ),\r\n+    click.option(\r\n+        \"--ignore-extras\",\r\n+        is_flag=True,\r\n+        help=\"If a CSV line has more than the expected number of values, ignore the extras\",\r\n+    ),\r\n+    click.option(\r\n+        \"--extras-key\",\r\n+        help=\"If a CSV line has more than the expected number of values put them in a list in this column\",\r\n+    ),\r\n )\r\n \r\n \r\n@@ -885,6 +894,8 @@ def insert_upsert_implementation(\r\n     sniff,\r\n     no_headers,\r\n     encoding,\r\n+    ignore_extras,\r\n+    extras_key,\r\n     batch_size,\r\n     alter,\r\n     upsert,\r\n@@ -909,6 +920,10 @@ def insert_upsert_implementation(\r\n         raise click.ClickException(\"--flatten cannot be used with --csv or --tsv\")\r\n     if encoding and not (csv or tsv):\r\n         raise click.ClickException(\"--encoding must be used with --csv or --tsv\")\r\n+    if ignore_extras and extras_key:\r\n+        raise click.ClickException(\r\n+            \"--ignore-extras and --extras-key cannot be used together\"\r\n+        )\r\n     if pk and len(pk) == 1:\r\n         pk = pk[0]\r\n     encoding = encoding or \"utf-8-sig\"\r\n@@ -935,7 +950,9 @@ def insert_upsert_implementation(\r\n                 csv_reader_args[\"delimiter\"] = delimiter\r\n             if quotechar:\r\n                 csv_reader_args[\"quotechar\"] = quotechar\r\n-            reader = csv_std.reader(decoded, **csv_reader_args)\r\n+            reader = _extra_key_strategy(\r\n+                csv_std.reader(decoded, **csv_reader_args), ignore_extras, extras_key\r\n+            )\r\n             first_row = next(reader)\r\n             if no_headers:\r\n                 headers = [\"untitled_{}\".format(i + 1) for i in range(len(first_row))]\r\n@@ -1101,6 +1118,8 @@ def insert(\r\n     sniff,\r\n     no_headers,\r\n     encoding,\r\n+    ignore_extras,\r\n+    extras_key,\r\n     batch_size,\r\n     alter,\r\n     detect_types,\r\n@@ -1176,6 +1195,8 @@ def insert(\r\n             sniff,\r\n             no_headers,\r\n             encoding,\r\n+            ignore_extras,\r\n+            extras_key,\r\n             batch_size,\r\n             alter=alter,\r\n             upsert=False,\r\n@@ -1214,6 +1235,8 @@ def upsert(\r\n     sniff,\r\n     no_headers,\r\n     encoding,\r\n+    ignore_extras,\r\n+    extras_key,\r\n     alter,\r\n     not_null,\r\n     default,\r\n@@ -1254,6 +1277,8 @@ def upsert(\r\n             sniff,\r\n             no_headers,\r\n             encoding,\r\n+            ignore_extras,\r\n+            extras_key,\r\n             batch_size,\r\n             alter=alter,\r\n             upsert=True,\r\n@@ -1297,6 +1322,8 @@ def bulk(\r\n     sniff,\r\n     no_headers,\r\n     encoding,\r\n+    ignore_extras,\r\n+    extras_key,\r\n     load_extension,\r\n ):\r\n     \"\"\"\r\n@@ -1331,6 +1358,8 @@ def bulk(\r\n             sniff=sniff,\r\n             no_headers=no_headers,\r\n             encoding=encoding,\r\n+            ignore_extras=ignore_extras,\r\n+            extras_key=extras_key,\r\n             batch_size=batch_size,\r\n             alter=False,\r\n             upsert=False,\r\n\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1271426387, "label": "CSV `extras_key=` and `ignore_extras=` equivalents for CLI tool"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/444#issuecomment-1155804591", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/444", "id": 1155804591, "node_id": "IC_kwDOCGYnMM5E5C2v", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T23:28:36Z", "updated_at": "2022-06-14T23:28:36Z", "author_association": "OWNER", "body": "I'm going with `--extras-key` and `--ignore-extras` as the two new options.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1271426387, "label": "CSV `extras_key=` and `ignore_extras=` equivalents for CLI tool"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/444#issuecomment-1155804459", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/444", "id": 1155804459, "node_id": "IC_kwDOCGYnMM5E5C0r", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T23:28:18Z", "updated_at": "2022-06-14T23:28:18Z", "author_association": "OWNER", "body": "I think these become part of the `_import_options` list which is used in a few places:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/b8af3b96f5c72317cc8783dc296a94f6719987d9/sqlite_utils/cli.py#L765-L800", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1271426387, "label": "CSV `extras_key=` and `ignore_extras=` equivalents for CLI tool"}, "performed_via_github_app": null}