html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905021933,https://api.github.com/repos/simonw/sqlite-utils/issues/319,905021933,IC_kwDOCGYnMM418Ynt,9599,2021-08-24T22:36:04Z,2021-08-24T22:36:04Z,OWNER,"> Oh, I misread. Yes some files will not be valid UTF-8, I'd throw a warning and continue (not adding that file) but if you want to get more elaborate you could allow to define a policy on what to do. Not adding the file, index binary content or use a conversion policy like the ones available on Python's decode. I thought about supporting those different policies (with something like `--errors ignore`) but I feel like that's getting a little bit too deep into the weeds. Right now if you try to import an invalid file the behaviour is the same as for the `sqlite-utils insert` command (I added the same detailed error message): ``` Error: Could not read file '/Users/simon/Dropbox/Development/sqlite-utils/data.txt' as text 'utf-8' codec can't decode byte 0xe3 in position 83: invalid continuation byte The input you provided uses a character encoding other than utf-8. You can fix this by passing the --encoding= option with the encoding of the file. If you do not know the encoding, running 'file filename.csv' may tell you. It's often worth trying: --encoding=latin-1 ``` If someone has data that can't be translated to valid text using a known encoding, I'm happy leaving them to have to insert it into a `BLOB` column instead.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",976399638,