{"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905043974", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 905043974, "node_id": "IC_kwDOCGYnMM418eAG", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-24T23:33:44Z", "updated_at": "2021-08-24T23:33:44Z", "author_association": "OWNER", "body": "Updated documentation: https://sqlite-utils.datasette.io/en/latest/cli.html#inserting-data-from-files", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905021933", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 905021933, "node_id": "IC_kwDOCGYnMM418Ynt", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-24T22:36:04Z", "updated_at": "2021-08-24T22:36:04Z", "author_association": "OWNER", "body": "> Oh, I misread. Yes some files will not be valid UTF-8, I'd throw a warning and continue (not adding that file) but if you want to get more elaborate you could allow to define a policy on what to do. Not adding the file, index binary content or use a conversion policy like the ones available on Python's decode.\r\n\r\nI thought about supporting those different policies (with something like `--errors ignore`) but I feel like that's getting a little bit too deep into the weeds. Right now if you try to import an invalid file the behaviour is the same as for the `sqlite-utils insert` command (I added the same detailed error message):\r\n\r\n```\r\nError: Could not read file '/Users/simon/Dropbox/Development/sqlite-utils/data.txt' as text\r\n\r\n'utf-8' codec can't decode byte 0xe3 in position 83: invalid continuation byte\r\n\r\nThe input you provided uses a character encoding other than utf-8.\r\n\r\nYou can fix this by passing the --encoding= option with the encoding of the file.\r\n\r\nIf you do not know the encoding, running 'file filename.csv' may tell you.\r\n\r\nIt's often worth trying: --encoding=latin-1\r\n```\r\nIf someone has data that can't be translated to valid text using a known encoding, I'm happy leaving them to have to insert it into a `BLOB` column instead.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905021047", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 905021047, "node_id": "IC_kwDOCGYnMM418YZ3", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-24T22:33:48Z", "updated_at": "2021-08-24T22:33:48Z", "author_association": "OWNER", "body": "I had a few doubts about the design just now. Since `content_text` is supported as a special argument, an alternative way of handling the above would be:\r\n\r\n sqlite-utils insert-files /tmp/text.db files *.txt -c path -c content_text -c size\r\n\r\nThis does exactly the same thing as just using `--text` and not specifying any columns, because the actual implementation of `--text` is as follows:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/0c796cd945b146b7395ff5f553861400be503867/sqlite_utils/cli.py#L1851-L1855\r\n\r\nBut actually I think that's OK - ``--text`` is a useful shorthand that avoids you having to remember how to manually specify those columns with `-c`. So I'm going to leave the design as is.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905013183", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 905013183, "node_id": "IC_kwDOCGYnMM418We_", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-24T22:15:34Z", "updated_at": "2021-08-24T22:15:34Z", "author_association": "OWNER", "body": "Here's the error message I have working for invalid unicode:\r\n```\r\nsqlite-utils insert-files /tmp/text.db files *.txt --text\r\n [------------------------------------] 0%\r\nError: Could not read file '/Users/simon/Dropbox/Development/sqlite-utils/data.txt' as text\r\n\r\n'utf-8' codec can't decode byte 0xe3 in position 83: invalid continuation byte\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905013162", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 905013162, "node_id": "IC_kwDOCGYnMM418Weq", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-24T22:15:31Z", "updated_at": "2021-08-24T22:15:31Z", "author_association": "OWNER", "body": "I'm going to assume utf-8 but allow `--encoding` to be used to specify something different, since that option is already supported by other commands.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905001586", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 905001586, "node_id": "IC_kwDOCGYnMM418Tpy", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-24T21:52:50Z", "updated_at": "2021-08-24T21:52:50Z", "author_association": "OWNER", "body": "Will need to re-title this section of the documentation: https://sqlite-utils.datasette.io/en/3.16/cli.html#inserting-binary-data-from-files - \"Inserting binary data from files\" will become \"Inserting data from files\"\r\n\r\nI'm OK with keeping the default as `BLOB` but I could add a `--text` option which stores the content as text instead.\r\n\r\nIf the text can't be stored as `utf-8` I'll probably raise an error.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/319#issuecomment-904999850", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/319", "id": 904999850, "node_id": "IC_kwDOCGYnMM418TOq", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-24T21:49:08Z", "updated_at": "2021-08-24T21:49:08Z", "author_association": "OWNER", "body": "This is a good idea.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 976399638, "label": "[Enhancement] Please allow 'insert-files' to insert content as text."}, "performed_via_github_app": null}