issues
1 row where body = "I was using sqlite-utils to create a DB from a CSV and it turns out the CSV contains a NUL byte. When the processing reaches the line that contains the NUL an exception is raised. I'm wondering if there is something that can be done in `sqlite-utils` to say "skip lines with encoding errors" or some such. I think it isn't super straightforward though as the exception comes from inside the `csv` module that does all the parsing. Concretely the file is the `KernelVersions.csv` from https://www.kaggle.com/datasets/kaggle/meta-kaggle This is the command and output: ``` $ sqlite-utils insert --csv kaggle.db kaggle KernelVersions.csv [------------------------------------] 0% [#####################---------------] 60% 00:04:24Traceback (most recent call last): File "/home/foobar/miniconda/envs/meta-kaggle/bin/sqlite-utils", line 10, in <module> sys.exit(cli()) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 1128, in __call__ return self.main(*args, **kwargs) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1223, in insert insert_upsert_implementation( File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1085, in insert_upsert_implementation db[table].insert_all( File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/db.py", line 3198, in insert_all chunk = list(chunk) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/db.py", line 3742, in fix_square_braces for record in records: File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1071, in <genexpr> docs = (decode_base64_values(doc) for doc in docs) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1068, in <genexpr> docs = (verify_is_dict(doc) for doc in docs) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1003, in <genexpr> docs = (dict(zip(headers, row)) for row in reader) _csv.Error: line contains NUL ```" and state = "open" sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | pull_request | body | repo | type | active_lock_reason | performed_via_github_app | reactions | draft | state_reason |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1839344979 | I_kwDOCGYnMM5toi1T | 582 | Handling CSV/file input that contains NUL bytes | betatim 1448859 | open | 0 | 0 | 2023-08-07T12:24:14Z | 2023-08-07T12:24:14Z | NONE | I was using sqlite-utils to create a DB from a CSV and it turns out the CSV contains a NUL byte. When the processing reaches the line that contains the NUL an exception is raised. I'm wondering if there is something that can be done in Concretely the file is the This is the command and output:
|
sqlite-utils 140912432 | issue | { "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/582/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [pull_request] TEXT, [body] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT , [active_lock_reason] TEXT, [performed_via_github_app] TEXT, [reactions] TEXT, [draft] INTEGER, [state_reason] TEXT); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);