{"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905043974", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 905043974, "node_id": "IC_kwDOCGYnMM418eAG", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-24T23:33:44Z", "updated_at": "2021-08-24T23:33:44Z", "author_association": "OWNER", "body": "Updated documentation: https://sqlite-utils.datasette.io/en/latest/cli.html#inserting-data-from-files", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905024066", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 905024066, "node_id": "IC_kwDOCGYnMM418ZJC", "user": {"value": 66709385, "label": "pjamargh"}, "created_at": "2021-08-24T22:41:39Z", "updated_at": "2021-08-24T22:41:39Z", "author_association": "NONE", "body": "I'm happy with this functionality left the way you describe. In my case the data is homogeneous but other cases would work just by being consistent on the encoding. Thanks a lot, Simon!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905021933", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 905021933, "node_id": "IC_kwDOCGYnMM418Ynt", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-24T22:36:04Z", "updated_at": "2021-08-24T22:36:04Z", "author_association": "OWNER", "body": "> Oh, I misread. Yes some files will not be valid UTF-8, I'd throw a warning and continue (not adding that file) but if you want to get more elaborate you could allow to define a policy on what to do. Not adding the file, index binary content or use a conversion policy like the ones available on Python's decode.\r\n\r\nI thought about supporting those different policies (with something like `--errors ignore`) but I feel like that's getting a little bit too deep into the weeds. Right now if you try to import an invalid file the behaviour is the same as for the `sqlite-utils insert` command (I added the same detailed error message):\r\n\r\n```\r\nError: Could not read file '/Users/simon/Dropbox/Development/sqlite-utils/data.txt' as text\r\n\r\n'utf-8' codec can't decode byte 0xe3 in position 83: invalid continuation byte\r\n\r\nThe input you provided uses a character encoding other than utf-8.\r\n\r\nYou can fix this by passing the --encoding= option with the encoding of the file.\r\n\r\nIf you do not know the encoding, running 'file filename.csv' may tell you.\r\n\r\nIt's often worth trying: --encoding=latin-1\r\n```\r\nIf someone has data that can't be translated to valid text using a known encoding, I'm happy leaving them to have to insert it into a `BLOB` column instead.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905021047", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 905021047, "node_id": "IC_kwDOCGYnMM418YZ3", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-24T22:33:48Z", "updated_at": "2021-08-24T22:33:48Z", "author_association": "OWNER", "body": "I had a few doubts about the design just now. Since `content_text` is supported as a special argument, an alternative way of handling the above would be:\r\n\r\n sqlite-utils insert-files /tmp/text.db files *.txt -c path -c content_text -c size\r\n\r\nThis does exactly the same thing as just using `--text` and not specifying any columns, because the actual implementation of `--text` is as follows:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/0c796cd945b146b7395ff5f553861400be503867/sqlite_utils/cli.py#L1851-L1855\r\n\r\nBut actually I think that's OK - ``--text`` is a useful shorthand that avoids you having to remember how to manually specify those columns with `-c`. So I'm going to leave the design as is.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905021010", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 905021010, "node_id": "IC_kwDOCGYnMM418YZS", "user": {"value": 66709385, "label": "pjamargh"}, "created_at": "2021-08-24T22:33:42Z", "updated_at": "2021-08-24T22:33:42Z", "author_association": "NONE", "body": "Oh, I misread. Yes some files will not be valid UTF-8, I'd throw a warning and continue (not adding that file) but if you want to get more elaborate you could allow to define a policy on what to do. Not adding the file, index binary content or use a conversion policy like the ones available on Python's decode.\r\n\r\nFrom https://stackoverflow.com/questions/24616678/unicodedecodeerror-in-python-when-reading-a-file-how-to-ignore-the-error-and-ju :\r\n - 'ignore' ignores errors. Note that ignoring encoding errors can lead to data loss.\r\n - 'replace' causes a replacement marker (such as '?') to be inserted where there is malformed data.\r\n - 'surrogateescape' will represent any incorrect bytes as code points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These private code points will then be turned back into the same bytes when the surrogateescape error handler is used when writing data. This is useful for processing files in an unknown encoding.\r\n - 'xmlcharrefreplace' is only supported when writing to a file. Characters not supported by the encoding are replaced with the appropriate XML character reference &#nnn;.\r\n - 'backslashreplace' (also only supported when writing) replaces unsupported characters with Python\u2019s backslashed escape sequences.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905013183", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 905013183, "node_id": "IC_kwDOCGYnMM418We_", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-24T22:15:34Z", "updated_at": "2021-08-24T22:15:34Z", "author_association": "OWNER", "body": "Here's the error message I have working for invalid unicode:\r\n```\r\nsqlite-utils insert-files /tmp/text.db files *.txt --text\r\n [------------------------------------] 0%\r\nError: Could not read file '/Users/simon/Dropbox/Development/sqlite-utils/data.txt' as text\r\n\r\n'utf-8' codec can't decode byte 0xe3 in position 83: invalid continuation byte\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905013162", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 905013162, "node_id": "IC_kwDOCGYnMM418Weq", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-24T22:15:31Z", "updated_at": "2021-08-24T22:15:31Z", "author_association": "OWNER", "body": "I'm going to assume utf-8 but allow `--encoding` to be used to specify something different, since that option is already supported by other commands.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905003381", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 905003381, "node_id": "IC_kwDOCGYnMM418UF1", "user": {"value": 66709385, "label": "pjamargh"}, "created_at": "2021-08-24T21:56:49Z", "updated_at": "2021-08-24T21:56:49Z", "author_association": "NONE", "body": "I was thinking that an approach could be making FILE_COLUMNS a generator (_get_file_columns(mode)) or you can just have a different set of columns (is there something else that makes sense to be changed on the text scenario?).\r\n\r\nAbout UTF-8 I was referring to the encoding to use when reading files. This can be difficult to auto-detect but I believe that UTF-8 is pretty much the standard for text files.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905001586", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 905001586, "node_id": "IC_kwDOCGYnMM418Tpy", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-24T21:52:50Z", "updated_at": "2021-08-24T21:52:50Z", "author_association": "OWNER", "body": "Will need to re-title this section of the documentation: https://sqlite-utils.datasette.io/en/3.16/cli.html#inserting-binary-data-from-files - \"Inserting binary data from files\" will become \"Inserting data from files\"\r\n\r\nI'm OK with keeping the default as `BLOB` but I could add a `--text` option which stores the content as text instead.\r\n\r\nIf the text can't be stored as `utf-8` I'll probably raise an error.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-904999850", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 904999850, "node_id": "IC_kwDOCGYnMM418TOq", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-24T21:49:08Z", "updated_at": "2021-08-24T21:49:08Z", "author_association": "OWNER", "body": "This is a good idea.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null}