github
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905043974 | https://api.github.com/repos/simonw/sqlite-utils/issues/319 | 905043974 | IC_kwDOCGYnMM418eAG | 9599 | 2021-08-24T23:33:44Z | 2021-08-24T23:33:44Z | OWNER | Updated documentation: https://sqlite-utils.datasette.io/en/latest/cli.html#inserting-data-from-files | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
976399638 | |
https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905021933 | https://api.github.com/repos/simonw/sqlite-utils/issues/319 | 905021933 | IC_kwDOCGYnMM418Ynt | 9599 | 2021-08-24T22:36:04Z | 2021-08-24T22:36:04Z | OWNER | > Oh, I misread. Yes some files will not be valid UTF-8, I'd throw a warning and continue (not adding that file) but if you want to get more elaborate you could allow to define a policy on what to do. Not adding the file, index binary content or use a conversion policy like the ones available on Python's decode. I thought about supporting those different policies (with something like `--errors ignore`) but I feel like that's getting a little bit too deep into the weeds. Right now if you try to import an invalid file the behaviour is the same as for the `sqlite-utils insert` command (I added the same detailed error message): ``` Error: Could not read file '/Users/simon/Dropbox/Development/sqlite-utils/data.txt' as text 'utf-8' codec can't decode byte 0xe3 in position 83: invalid continuation byte The input you provided uses a character encoding other than utf-8. You can fix this by passing the --encoding= option with the encoding of the file. If you do not know the encoding, running 'file filename.csv' may tell you. It's often worth trying: --encoding=latin-1 ``` If someone has data that can't be translated to valid text using a known encoding, I'm happy leaving them to have to insert it into a `BLOB` column instead. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
976399638 | |
https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905021047 | https://api.github.com/repos/simonw/sqlite-utils/issues/319 | 905021047 | IC_kwDOCGYnMM418YZ3 | 9599 | 2021-08-24T22:33:48Z | 2021-08-24T22:33:48Z | OWNER | I had a few doubts about the design just now. Since `content_text` is supported as a special argument, an alternative way of handling the above would be: sqlite-utils insert-files /tmp/text.db files *.txt -c path -c content_text -c size This does exactly the same thing as just using `--text` and not specifying any columns, because the actual implementation of `--text` is as follows: https://github.com/simonw/sqlite-utils/blob/0c796cd945b146b7395ff5f553861400be503867/sqlite_utils/cli.py#L1851-L1855 But actually I think that's OK - ``--text`` is a useful shorthand that avoids you having to remember how to manually specify those columns with `-c`. So I'm going to leave the design as is. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
976399638 | |
https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905013183 | https://api.github.com/repos/simonw/sqlite-utils/issues/319 | 905013183 | IC_kwDOCGYnMM418We_ | 9599 | 2021-08-24T22:15:34Z | 2021-08-24T22:15:34Z | OWNER | Here's the error message I have working for invalid unicode: ``` sqlite-utils insert-files /tmp/text.db files *.txt --text [------------------------------------] 0% Error: Could not read file '/Users/simon/Dropbox/Development/sqlite-utils/data.txt' as text 'utf-8' codec can't decode byte 0xe3 in position 83: invalid continuation byte ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
976399638 | |
https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905013162 | https://api.github.com/repos/simonw/sqlite-utils/issues/319 | 905013162 | IC_kwDOCGYnMM418Weq | 9599 | 2021-08-24T22:15:31Z | 2021-08-24T22:15:31Z | OWNER | I'm going to assume utf-8 but allow `--encoding` to be used to specify something different, since that option is already supported by other commands. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
976399638 | |
https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905001586 | https://api.github.com/repos/simonw/sqlite-utils/issues/319 | 905001586 | IC_kwDOCGYnMM418Tpy | 9599 | 2021-08-24T21:52:50Z | 2021-08-24T21:52:50Z | OWNER | Will need to re-title this section of the documentation: https://sqlite-utils.datasette.io/en/3.16/cli.html#inserting-binary-data-from-files - "Inserting binary data from files" will become "Inserting data from files" I'm OK with keeping the default as `BLOB` but I could add a `--text` option which stores the content as text instead. If the text can't be stored as `utf-8` I'll probably raise an error. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
976399638 | |
https://github.com/simonw/sqlite-utils/issues/319#issuecomment-904999850 | https://api.github.com/repos/simonw/sqlite-utils/issues/319 | 904999850 | IC_kwDOCGYnMM418TOq | 9599 | 2021-08-24T21:49:08Z | 2021-08-24T21:49:08Z | OWNER | This is a good idea. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
976399638 |