{"html_url": "https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155953345", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/439", "id": 1155953345, "node_id": "IC_kwDOCGYnMM5E5nLB", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-15T03:53:43Z", "updated_at": "2022-06-15T03:53:43Z", "author_association": "OWNER", "body": "I tried fixing this by using `.tell()` to read the file position as I was iterating through it:\r\n```diff\r\ndiff --git a/sqlite_utils/utils.py b/sqlite_utils/utils.py\r\nindex d2ccc5f..29ad12e 100644\r\n--- a/sqlite_utils/utils.py\r\n+++ b/sqlite_utils/utils.py\r\n@@ -149,10 +149,13 @@ class UpdateWrapper:\r\n def __init__(self, wrapped, update):\r\n self._wrapped = wrapped\r\n self._update = update\r\n+ self._tell = wrapped.tell()\r\n \r\n def __iter__(self):\r\n for line in self._wrapped:\r\n- self._update(len(line))\r\n+ tell = self._wrapped.tell()\r\n+ self._update(self._tell - tell)\r\n+ self._tell = tell\r\n yield line\r\n ```\r\nThis did not work - I get this error:\r\n\r\n```\r\n File \"/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/utils.py\", line 206, in _extra_key_strategy\r\n for row in reader:\r\n File \"/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/utils.py\", line 156, in __iter__\r\n tell = self._wrapped.tell()\r\nOSError: telling position disabled by next() call\r\n```\r\nIt looks like you can't use `.tell()` during iteration: https://stackoverflow.com/questions/29618936/how-to-solve-oserror-telling-position-disabled-by-next-call", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250495688, "label": "Misleading progress bar against utf-16-le CSV input"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155789101", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/439", "id": 1155789101, "node_id": "IC_kwDOCGYnMM5E4_Et", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T23:00:45Z", "updated_at": "2022-06-14T23:00:45Z", "author_association": "OWNER", "body": "I'm going to mark this as \"help wanted\" and leave it open. I'm glad that it's not actually a bug where errors get swallowed.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250495688, "label": "Misleading progress bar against utf-16-le CSV input"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155788944", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/439", "id": 1155788944, "node_id": "IC_kwDOCGYnMM5E4_CQ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T23:00:24Z", "updated_at": "2022-06-14T23:00:24Z", "author_association": "OWNER", "body": "The progress bar only works if the file-like object passed to it has a `fp.fileno()` that isn't 0 (for stdin) - that's how it detects that the file is something which it can measure the size of in order to show progress.\r\n\r\nIf we know the file size in bytes AND we know the character encoding, can we change `UpdateWrapper` to update the number of bytes-per-character instead?\r\n\r\nI don't think so: I can't see a way of definitively saying \"for this encoding the number of bytes per character is X\" - and in fact I'm pretty sure that question doesn't even make sense since variable-length encodings exist.\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250495688, "label": "Misleading progress bar against utf-16-le CSV input"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155784284", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/439", "id": 1155784284, "node_id": "IC_kwDOCGYnMM5E495c", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T22:51:03Z", "updated_at": "2022-06-14T22:52:13Z", "author_association": "OWNER", "body": "Yes, this is the problem. The progress bar length is set to the length in bytes of the file - `os.path.getsize(file.name)` - but it's then incremented by the length of each DECODED line in turn.\r\n\r\nSo if the file is in `utf-16-le` (twice the size of `utf-8`) the progress bar will finish at 50%!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250495688, "label": "Misleading progress bar against utf-16-le CSV input"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155782835", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/439", "id": 1155782835, "node_id": "IC_kwDOCGYnMM5E49iz", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T22:48:22Z", "updated_at": "2022-06-14T22:49:53Z", "author_association": "OWNER", "body": "Here's the code that implements the progress bar in question: https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/cli.py#L918-L932\r\n\r\nIt calls `file_progress()` which looks like this:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/utils.py#L159-L175\r\n\r\nWhich uses this:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/utils.py#L148-L156", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250495688, "label": "Misleading progress bar against utf-16-le CSV input"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155781399", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/439", "id": 1155781399, "node_id": "IC_kwDOCGYnMM5E49MX", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T22:45:41Z", "updated_at": "2022-06-14T22:45:41Z", "author_association": "OWNER", "body": "TIL how to use `iconv`: https://til.simonwillison.net/linux/iconv", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250495688, "label": "Misleading progress bar against utf-16-le CSV input"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155776023", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/439", "id": 1155776023, "node_id": "IC_kwDOCGYnMM5E474X", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T22:36:07Z", "updated_at": "2022-06-14T22:36:07Z", "author_association": "OWNER", "body": "Wait! The arguments in that are the wrong way round. This is correct:\r\n\r\n sqlite-utils insert --csv --delimiter \";\" --encoding \"utf-16-le\" test.db test csv\r\n\r\nIt still outputs the following:\r\n\r\n [------------------------------------] 0%\r\n [#################-------------------] 49% 00:00:02%\r\n\r\nBut it creates a `test.db` file that is 6.2MB.\r\n\r\nThat database has 3141 rows in it:\r\n\r\n```\r\n% sqlite-utils tables test.db --counts -t\r\ntable count\r\n------- -------\r\ntest 3142\r\n```\r\nI converted that `csv` file to utf-8 like so:\r\n\r\n iconv -f UTF-16LE -t UTF-8 csv > utf8.csv\r\n\r\nAnd it contains 3142 lines:\r\n```\r\n% wc -l utf8.csv \r\n 3142 utf8.csv\r\n```\r\nSo my hunch here is that the problem is actually that the progress bar doesn't know how to correctly measure files in `utf-16-le` encoding!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250495688, "label": "Misleading progress bar against utf-16-le CSV input"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155772244", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/439", "id": 1155772244, "node_id": "IC_kwDOCGYnMM5E469U", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T22:30:03Z", "updated_at": "2022-06-14T22:30:03Z", "author_association": "OWNER", "body": "Tried this:\r\n```\r\n% python -i $(which sqlite-utils) insert --csv --delimiter \";\" --encoding \"utf-16-le\" test test.db csv\r\n [------------------------------------] 0%\r\n [#################-------------------] 49% 00:00:01Traceback (most recent call last):\r\n File \"/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py\", line 1072, in main\r\n ctx.exit()\r\n File \"/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py\", line 692, in exit\r\n raise Exit(code)\r\nclick.exceptions.Exit: 0\r\n\r\nDuring handling of the above exception, another exception occurred:\r\n\r\nTraceback (most recent call last):\r\n File \"/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/bin/sqlite-utils\", line 33, in \r\n sys.exit(load_entry_point('sqlite-utils', 'console_scripts', 'sqlite-utils')())\r\n File \"/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py\", line 1137, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py\", line 1090, in main\r\n sys.exit(e.exit_code)\r\nSystemExit: 0\r\n>>> \r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250495688, "label": "Misleading progress bar against utf-16-le CSV input"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155771462", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/439", "id": 1155771462, "node_id": "IC_kwDOCGYnMM5E46xG", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T22:28:38Z", "updated_at": "2022-06-14T22:28:38Z", "author_association": "OWNER", "body": "Maybe this isn't a CSV field value problem - I tried this patch and didn't seem to hit the new breakpoints:\r\n```diff\r\ndiff --git a/sqlite_utils/utils.py b/sqlite_utils/utils.py\r\nindex d2ccc5f..f1b823a 100644\r\n--- a/sqlite_utils/utils.py\r\n+++ b/sqlite_utils/utils.py\r\n@@ -204,13 +204,17 @@ def _extra_key_strategy(\r\n # DictReader adds a 'None' key with extra row values\r\n if None not in row:\r\n yield row\r\n- elif ignore_extras:\r\n+ continue\r\n+ else:\r\n+ breakpoint()\r\n+ if ignore_extras:\r\n # ignoring row.pop(none) because of this issue:\r\n # https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155358637\r\n row.pop(None) # type: ignore\r\n yield row\r\n elif not extras_key:\r\n extras = row.pop(None) # type: ignore\r\n+ breakpoint()\r\n raise RowError(\r\n \"Row {} contained these extra values: {}\".format(row, extras)\r\n )\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250495688, "label": "Misleading progress bar against utf-16-le CSV input"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155769216", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/439", "id": 1155769216, "node_id": "IC_kwDOCGYnMM5E46OA", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T22:24:49Z", "updated_at": "2022-06-14T22:25:06Z", "author_association": "OWNER", "body": "I have a hunch that this crash may be caused by a CSV value which is too long, as addressed at the library level in:\r\n- #440\r\n\r\nBut not yet addressed in the CLI tool, see:\r\n\r\n- #444\r\n\r\nEither way though, I really don't like that errors like this are swallowed!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250495688, "label": "Misleading progress bar against utf-16-le CSV input"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155767202", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/439", "id": 1155767202, "node_id": "IC_kwDOCGYnMM5E45ui", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T22:21:10Z", "updated_at": "2022-06-14T22:21:10Z", "author_association": "OWNER", "body": "I can't figure out why that error is being swallowed like that. The most likely culprit was this code: \r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/cli.py#L1021-L1043\r\n\r\nBut I tried changing it like this:\r\n\r\n```diff\r\ndiff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py\r\nindex 86eddfb..ed26fdd 100644\r\n--- a/sqlite_utils/cli.py\r\n+++ b/sqlite_utils/cli.py\r\n@@ -1023,6 +1023,7 @@ def insert_upsert_implementation(\r\n docs, pk=pk, batch_size=batch_size, alter=alter, **extra_kwargs\r\n )\r\n except Exception as e:\r\n+ raise\r\n if (\r\n isinstance(e, sqlite3.OperationalError)\r\n and e.args\r\n```\r\nAnd your steps to reproduce still got to 49% and then failed silently.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250495688, "label": "Misleading progress bar against utf-16-le CSV input"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1139426398", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/439", "id": 1139426398, "node_id": "IC_kwDOCGYnMM5D6kRe", "user": {"value": 4068, "label": "frafra"}, "created_at": "2022-05-27T09:04:05Z", "updated_at": "2022-05-27T10:44:54Z", "author_association": "NONE", "body": "This code works:\r\n\r\n```python\r\nimport csv\r\nimport sqlite_utils\r\ndb = sqlite_utils.Database(\"test.db\")\r\nreader = csv.DictReader(open(\"csv\", encoding=\"utf-16-le\").read().split(\"\\r\\n\"), delimiter=\";\")\r\ndb[\"test\"].insert_all(reader, pk=\"Id\")\r\n```\r\n\r\nI used `iconv` to change the encoding; sqlite-utils can import the resulting file, even if it stops at 98 %:\r\n\r\n```\r\nsqlite-utils insert --csv test test.db clean \r\n [------------------------------------] 0%\r\n [###################################-] 98% 00:00:00\r\n```\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1250495688, "label": "Misleading progress bar against utf-16-le CSV input"}, "performed_via_github_app": null}