{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155767915", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1155767915, "node_id": "IC_kwDOCGYnMM5E455r", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T22:22:27Z", "updated_at": "2022-06-14T22:22:27Z", "author_association": "OWNER", "body": "I forgot to add equivalents of `extras_key=` and `ignore_extras=` to the CLI tool - will do that in a separate issue.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155672675", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1155672675, "node_id": "IC_kwDOCGYnMM5E4ipj", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T20:19:07Z", "updated_at": "2022-06-14T20:19:07Z", "author_association": "OWNER", "body": "Documentation: https://sqlite-utils.datasette.io/en/latest/python-api.html#reading-rows-from-a-file", "reactions": "{\"total_count\": 1, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 1, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155666672", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1155666672, "node_id": "IC_kwDOCGYnMM5E4hLw", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T20:11:52Z", "updated_at": "2022-06-14T20:11:52Z", "author_association": "OWNER", "body": "I'm going to rename `restkey` to `extras_key` for consistency with `ignore_extras`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155389614", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1155389614, "node_id": "IC_kwDOCGYnMM5E3diu", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T15:54:03Z", "updated_at": "2022-06-14T15:54:03Z", "author_association": "OWNER", "body": "Filed an issue against `python/typeshed`:\r\n\r\n- https://github.com/python/typeshed/issues/8075", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155358637", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1155358637, "node_id": "IC_kwDOCGYnMM5E3V-t", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T15:31:34Z", "updated_at": "2022-06-14T15:31:34Z", "author_association": "OWNER", "body": "Getting this past `mypy` is really hard!\r\n\r\n```\r\n% mypy sqlite_utils\r\nsqlite_utils/utils.py:189: error: No overload variant of \"pop\" of \"MutableMapping\" matches argument type \"None\"\r\nsqlite_utils/utils.py:189: note: Possible overload variants:\r\nsqlite_utils/utils.py:189: note:     def pop(self, key: str) -> str\r\nsqlite_utils/utils.py:189: note:     def [_T] pop(self, key: str, default: Union[str, _T] = ...) -> Union[str, _T]\r\n```\r\nThat's because of this line:\r\n\r\n    row.pop(key=None)\r\n\r\nWhich is legit here - we have a dictionary where one of the keys is `None` and we want to remove that key. But the baked in type is apparently `def pop(self, key: str) -> str`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155350755", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1155350755, "node_id": "IC_kwDOCGYnMM5E3UDj", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T15:25:18Z", "updated_at": "2022-06-14T15:25:18Z", "author_association": "OWNER", "body": "That broke `mypy`:\r\n\r\n`sqlite_utils/utils.py:229: error: Incompatible types in assignment (expression has type \"Iterable[Dict[Any, Any]]\", variable has type \"DictReader[str]\")`", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155317293", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1155317293, "node_id": "IC_kwDOCGYnMM5E3L4t", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T15:04:01Z", "updated_at": "2022-06-14T15:04:01Z", "author_association": "OWNER", "body": "I think that's unavoidable: it looks like `csv.Sniffer` only works if you feed it a CSV file with an equal number of values in each row, which is understandable.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155310521", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1155310521, "node_id": "IC_kwDOCGYnMM5E3KO5", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T14:58:50Z", "updated_at": "2022-06-14T14:58:50Z", "author_association": "OWNER", "body": "Interesting challenge in writing tests for this: if you give `csv.Sniffer` a short example with an invalid row in it sometimes it picks the wrong delimiter!\r\n\r\n    id,name\\r\\n1,Cleo,oops\r\n\r\nIt decided the delimiter there was `e`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154475454", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1154475454, "node_id": "IC_kwDOCGYnMM5Ez-W-", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-13T21:52:03Z", "updated_at": "2022-06-13T21:52:03Z", "author_association": "OWNER", "body": "The exception will be called `RowError`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154474482", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1154474482, "node_id": "IC_kwDOCGYnMM5Ez-Hy", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-13T21:50:59Z", "updated_at": "2022-06-13T21:51:24Z", "author_association": "OWNER", "body": "Decision: I'm going to default to raising an exception if a row has too many values in it.\r\n\r\nYou'll be able to pass `ignore_extras=True` to ignore those extra values, or pass `restkey=\"the_rest\"` to stick them in a list in the `restkey` column.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154457893", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1154457893, "node_id": "IC_kwDOCGYnMM5Ez6El", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-13T21:29:02Z", "updated_at": "2022-06-13T21:29:02Z", "author_association": "OWNER", "body": "Here's the current function signature for `rows_from_file()`:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/26e6d2622c57460a24ffdd0128bbaac051d51a5f/sqlite_utils/utils.py#L174-L179", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154457028", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1154457028, "node_id": "IC_kwDOCGYnMM5Ez53E", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-13T21:28:03Z", "updated_at": "2022-06-13T21:28:03Z", "author_association": "OWNER", "body": "Whatever I decide, I can implement it in `rows_from_file()`, maybe as an optional parameter - then decide how to call it from the `sqlite-utils insert` CLI (perhaps with a new option there too).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154456183", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1154456183, "node_id": "IC_kwDOCGYnMM5Ez5p3", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-13T21:26:55Z", "updated_at": "2022-06-13T21:26:55Z", "author_association": "OWNER", "body": "So I need to make a design decision here: what should `sqlite-utils` do with CSV files that have rows with more values than there are headings?\r\n\r\nSome options:\r\n\r\n- Ignore those extra fields entirely - silently drop that data. I'm not keen on this.\r\n- Throw an error. The library does this already, but the error is incomprehensible - it could turn into a useful, human-readable error instead.\r\n- Put the data in a JSON list in a column with a known name (`None` is not a valid column name, so not that). This could be something like `_restkey` or `_values_with_no_heading`. This feels like a better option, but I'd need to carefully pick a name for it - and come up with an answer for the question of what to do if the CSV file being important already uses that heading name for something else.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154454127", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1154454127, "node_id": "IC_kwDOCGYnMM5Ez5Jv", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-13T21:24:18Z", "updated_at": "2022-06-13T21:24:18Z", "author_association": "OWNER", "body": "That weird behaviour is documented here: https://docs.python.org/3/library/csv.html#csv.DictReader\r\n\r\n> If a row has more fields than fieldnames, the remaining data is put in a list and stored with the fieldname specified by *restkey* (which defaults to `None`). If a non-blank row has fewer fields than fieldnames, the missing values are filled-in with the value of *restval* (which defaults to `None`).", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154453319", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1154453319, "node_id": "IC_kwDOCGYnMM5Ez49H", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-13T21:23:16Z", "updated_at": "2022-06-13T21:23:16Z", "author_association": "OWNER", "body": "Aha! I think I see what's happening here. Here's what `DictReader` does if one of the lines has too many items in it:\r\n\r\n```pycon\r\n>>> import csv, io\r\n>>> list(csv.DictReader(io.StringIO(\"id,name\\n1,Cleo,nohead\\n2,Barry\")))\r\n[{'id': '1', 'name': 'Cleo', None: ['nohead']}, {'id': '2', 'name': 'Barry'}]\r\n```\r\nSee how that row with too many items gets this:\r\n`[{'id': '1', 'name': 'Cleo', None: ['nohead']}`\r\n\r\nThat's a `None` for the key and (weirdly) a list containing the single item for the value!\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154449442", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1154449442, "node_id": "IC_kwDOCGYnMM5Ez4Ai", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-13T21:18:26Z", "updated_at": "2022-06-13T21:20:12Z", "author_association": "OWNER", "body": "Here are full steps to replicate the bug:\r\n```python\r\nfrom urllib.request import urlopen\r\nimport sqlite_utils\r\ndb = sqlite_utils.Database(memory=True)\r\nwith urlopen(\"https://artsdatabanken.no/Fab2018/api/export/csv\") as fab:\r\n    reader, other = sqlite_utils.utils.rows_from_file(fab, encoding=\"utf-16le\")\r\n    db[\"fab2018\"].insert_all(reader, pk=\"Id\")\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154396400", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1154396400, "node_id": "IC_kwDOCGYnMM5EzrDw", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-13T20:28:25Z", "updated_at": "2022-06-13T20:28:25Z", "author_association": "OWNER", "body": "Fixing that `key` thing (to ignore any key that is `None`) revealed a new bug:\r\n\r\n```\r\nFile ~/Dropbox/Development/sqlite-utils/sqlite_utils/utils.py:376, in hash_record(record, keys)\r\n    373 if keys is not None:\r\n    374     to_hash = {key: record[key] for key in keys}\r\n    375 return hashlib.sha1(\r\n--> 376     json.dumps(to_hash, separators=(\",\", \":\"), sort_keys=True, default=repr).encode(\r\n    377         \"utf8\"\r\n    378     )\r\n    379 ).hexdigest()\r\n\r\nFile ~/.pyenv/versions/3.8.2/lib/python3.8/json/__init__.py:234, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)\r\n    232 if cls is None:\r\n    233     cls = JSONEncoder\r\n--> 234 return cls(\r\n    235     skipkeys=skipkeys, ensure_ascii=ensure_ascii,\r\n    236     check_circular=check_circular, allow_nan=allow_nan, indent=indent,\r\n    237     separators=separators, default=default, sort_keys=sort_keys,\r\n    238     **kw).encode(obj)\r\n\r\nFile ~/.pyenv/versions/3.8.2/lib/python3.8/json/encoder.py:199, in JSONEncoder.encode(self, o)\r\n    195         return encode_basestring(o)\r\n    196 # This doesn't pass the iterator directly to ''.join() because the\r\n    197 # exceptions aren't as detailed.  The list call should be roughly\r\n    198 # equivalent to the PySequence_Fast that ''.join() would do.\r\n--> 199 chunks = self.iterencode(o, _one_shot=True)\r\n    200 if not isinstance(chunks, (list, tuple)):\r\n    201     chunks = list(chunks)\r\n\r\nFile ~/.pyenv/versions/3.8.2/lib/python3.8/json/encoder.py:257, in JSONEncoder.iterencode(self, o, _one_shot)\r\n    252 else:\r\n    253     _iterencode = _make_iterencode(\r\n    254         markers, self.default, _encoder, self.indent, floatstr,\r\n    255         self.key_separator, self.item_separator, self.sort_keys,\r\n    256         self.skipkeys, _one_shot)\r\n--> 257 return _iterencode(o, 0)\r\n\r\nTypeError: '<' not supported between instances of 'NoneType' and 'str'\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154387591", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1154387591, "node_id": "IC_kwDOCGYnMM5Ezo6H", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-13T20:17:51Z", "updated_at": "2022-06-13T20:17:51Z", "author_association": "OWNER", "body": "I don't understand why that works but calling `insert_all()` does not.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154386795", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1154386795, "node_id": "IC_kwDOCGYnMM5Ezotr", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-13T20:16:53Z", "updated_at": "2022-06-13T20:16:53Z", "author_association": "OWNER", "body": "Steps to demonstrate that `sqlite-utils insert` is not affected:\r\n\r\n```bash\r\ncurl -o artsdatabanken.csv https://artsdatabanken.no/Fab2018/api/export/csv\r\nsqlite-utils insert arts.db artsdatabanken artsdatabanken.csv --sniff --csv --encoding utf-16le\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}
{"html_url": "https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154385916", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/440", "id": 1154385916, "node_id": "IC_kwDOCGYnMM5Ezof8", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-13T20:15:49Z", "updated_at": "2022-06-13T20:15:49Z", "author_association": "OWNER", "body": "`rows_from_file()` isn't part of the documented API but maybe it should be!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250629388, "label": "CSV files with too many values in a row cause errors"}, "performed_via_github_app": null}