html_url,issue_url,id,node_id,user,user_label,created_at,updated_at,author_association,body,reactions,issue,issue_label,performed_via_github_app https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905043974,https://api.github.com/repos/simonw/sqlite-utils/issues/319,905043974,IC_kwDOCGYnMM418eAG,9599,simonw,2021-08-24T23:33:44Z,2021-08-24T23:33:44Z,OWNER,Updated documentation: https://sqlite-utils.datasette.io/en/latest/cli.html#inserting-data-from-files,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",976399638,[Enhancement] Please allow 'insert-files' to insert content as text., https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905021933,https://api.github.com/repos/simonw/sqlite-utils/issues/319,905021933,IC_kwDOCGYnMM418Ynt,9599,simonw,2021-08-24T22:36:04Z,2021-08-24T22:36:04Z,OWNER,"> Oh, I misread. Yes some files will not be valid UTF-8, I'd throw a warning and continue (not adding that file) but if you want to get more elaborate you could allow to define a policy on what to do. Not adding the file, index binary content or use a conversion policy like the ones available on Python's decode. I thought about supporting those different policies (with something like `--errors ignore`) but I feel like that's getting a little bit too deep into the weeds. Right now if you try to import an invalid file the behaviour is the same as for the `sqlite-utils insert` command (I added the same detailed error message): ``` Error: Could not read file '/Users/simon/Dropbox/Development/sqlite-utils/data.txt' as text 'utf-8' codec can't decode byte 0xe3 in position 83: invalid continuation byte The input you provided uses a character encoding other than utf-8. You can fix this by passing the --encoding= option with the encoding of the file. If you do not know the encoding, running 'file filename.csv' may tell you. It's often worth trying: --encoding=latin-1 ``` If someone has data that can't be translated to valid text using a known encoding, I'm happy leaving them to have to insert it into a `BLOB` column instead.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",976399638,[Enhancement] Please allow 'insert-files' to insert content as text., https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905021047,https://api.github.com/repos/simonw/sqlite-utils/issues/319,905021047,IC_kwDOCGYnMM418YZ3,9599,simonw,2021-08-24T22:33:48Z,2021-08-24T22:33:48Z,OWNER,"I had a few doubts about the design just now. Since `content_text` is supported as a special argument, an alternative way of handling the above would be: sqlite-utils insert-files /tmp/text.db files *.txt -c path -c content_text -c size This does exactly the same thing as just using `--text` and not specifying any columns, because the actual implementation of `--text` is as follows: https://github.com/simonw/sqlite-utils/blob/0c796cd945b146b7395ff5f553861400be503867/sqlite_utils/cli.py#L1851-L1855 But actually I think that's OK - ``--text`` is a useful shorthand that avoids you having to remember how to manually specify those columns with `-c`. So I'm going to leave the design as is.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",976399638,[Enhancement] Please allow 'insert-files' to insert content as text., https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905013183,https://api.github.com/repos/simonw/sqlite-utils/issues/319,905013183,IC_kwDOCGYnMM418We_,9599,simonw,2021-08-24T22:15:34Z,2021-08-24T22:15:34Z,OWNER,"Here's the error message I have working for invalid unicode: ``` sqlite-utils insert-files /tmp/text.db files *.txt --text [------------------------------------] 0% Error: Could not read file '/Users/simon/Dropbox/Development/sqlite-utils/data.txt' as text 'utf-8' codec can't decode byte 0xe3 in position 83: invalid continuation byte ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",976399638,[Enhancement] Please allow 'insert-files' to insert content as text., https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905013162,https://api.github.com/repos/simonw/sqlite-utils/issues/319,905013162,IC_kwDOCGYnMM418Weq,9599,simonw,2021-08-24T22:15:31Z,2021-08-24T22:15:31Z,OWNER,"I'm going to assume utf-8 but allow `--encoding` to be used to specify something different, since that option is already supported by other commands.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",976399638,[Enhancement] Please allow 'insert-files' to insert content as text., https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905001586,https://api.github.com/repos/simonw/sqlite-utils/issues/319,905001586,IC_kwDOCGYnMM418Tpy,9599,simonw,2021-08-24T21:52:50Z,2021-08-24T21:52:50Z,OWNER,"Will need to re-title this section of the documentation: https://sqlite-utils.datasette.io/en/3.16/cli.html#inserting-binary-data-from-files - ""Inserting binary data from files"" will become ""Inserting data from files"" I'm OK with keeping the default as `BLOB` but I could add a `--text` option which stores the content as text instead. If the text can't be stored as `utf-8` I'll probably raise an error.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",976399638,[Enhancement] Please allow 'insert-files' to insert content as text., https://github.com/simonw/sqlite-utils/issues/319#issuecomment-904999850,https://api.github.com/repos/simonw/sqlite-utils/issues/319,904999850,IC_kwDOCGYnMM418TOq,9599,simonw,2021-08-24T21:49:08Z,2021-08-24T21:49:08Z,OWNER,This is a good idea.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",976399638,[Enhancement] Please allow 'insert-files' to insert content as text.,