home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where "created_at" is on date 2022-06-14 and issue = 1250495688 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • simonw 10

issue 1

  • Misleading progress bar against utf-16-le CSV input · 10 ✖

author_association 1

  • OWNER 10
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1155789101 https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155789101 https://api.github.com/repos/simonw/sqlite-utils/issues/439 IC_kwDOCGYnMM5E4_Et simonw 9599 2022-06-14T23:00:45Z 2022-06-14T23:00:45Z OWNER

I'm going to mark this as "help wanted" and leave it open. I'm glad that it's not actually a bug where errors get swallowed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Misleading progress bar against utf-16-le CSV input 1250495688  
1155788944 https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155788944 https://api.github.com/repos/simonw/sqlite-utils/issues/439 IC_kwDOCGYnMM5E4_CQ simonw 9599 2022-06-14T23:00:24Z 2022-06-14T23:00:24Z OWNER

The progress bar only works if the file-like object passed to it has a fp.fileno() that isn't 0 (for stdin) - that's how it detects that the file is something which it can measure the size of in order to show progress.

If we know the file size in bytes AND we know the character encoding, can we change UpdateWrapper to update the number of bytes-per-character instead?

I don't think so: I can't see a way of definitively saying "for this encoding the number of bytes per character is X" - and in fact I'm pretty sure that question doesn't even make sense since variable-length encodings exist.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Misleading progress bar against utf-16-le CSV input 1250495688  
1155784284 https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155784284 https://api.github.com/repos/simonw/sqlite-utils/issues/439 IC_kwDOCGYnMM5E495c simonw 9599 2022-06-14T22:51:03Z 2022-06-14T22:52:13Z OWNER

Yes, this is the problem. The progress bar length is set to the length in bytes of the file - os.path.getsize(file.name) - but it's then incremented by the length of each DECODED line in turn.

So if the file is in utf-16-le (twice the size of utf-8) the progress bar will finish at 50%!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Misleading progress bar against utf-16-le CSV input 1250495688  
1155782835 https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155782835 https://api.github.com/repos/simonw/sqlite-utils/issues/439 IC_kwDOCGYnMM5E49iz simonw 9599 2022-06-14T22:48:22Z 2022-06-14T22:49:53Z OWNER

Here's the code that implements the progress bar in question: https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/cli.py#L918-L932

It calls file_progress() which looks like this:

https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/utils.py#L159-L175

Which uses this:

https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/utils.py#L148-L156

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Misleading progress bar against utf-16-le CSV input 1250495688  
1155781399 https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155781399 https://api.github.com/repos/simonw/sqlite-utils/issues/439 IC_kwDOCGYnMM5E49MX simonw 9599 2022-06-14T22:45:41Z 2022-06-14T22:45:41Z OWNER

TIL how to use iconv: https://til.simonwillison.net/linux/iconv

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Misleading progress bar against utf-16-le CSV input 1250495688  
1155776023 https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155776023 https://api.github.com/repos/simonw/sqlite-utils/issues/439 IC_kwDOCGYnMM5E474X simonw 9599 2022-06-14T22:36:07Z 2022-06-14T22:36:07Z OWNER

Wait! The arguments in that are the wrong way round. This is correct:

sqlite-utils insert --csv --delimiter ";" --encoding "utf-16-le" test.db test csv

It still outputs the following:

[------------------------------------] 0% [#################-------------------] 49% 00:00:02%

But it creates a test.db file that is 6.2MB.

That database has 3141 rows in it:

``` % sqlite-utils tables test.db --counts -t table count


test 3142 `` I converted thatcsv` file to utf-8 like so:

iconv -f UTF-16LE -t UTF-8 csv > utf8.csv

And it contains 3142 lines: % wc -l utf8.csv 3142 utf8.csv So my hunch here is that the problem is actually that the progress bar doesn't know how to correctly measure files in utf-16-le encoding!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Misleading progress bar against utf-16-le CSV input 1250495688  
1155772244 https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155772244 https://api.github.com/repos/simonw/sqlite-utils/issues/439 IC_kwDOCGYnMM5E469U simonw 9599 2022-06-14T22:30:03Z 2022-06-14T22:30:03Z OWNER

Tried this: ``` % python -i $(which sqlite-utils) insert --csv --delimiter ";" --encoding "utf-16-le" test test.db csv [------------------------------------] 0% [#################-------------------] 49% 00:00:01Traceback (most recent call last): File "/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py", line 1072, in main ctx.exit() File "/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py", line 692, in exit raise Exit(code) click.exceptions.Exit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/bin/sqlite-utils", line 33, in <module> sys.exit(load_entry_point('sqlite-utils', 'console_scripts', 'sqlite-utils')()) File "/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py", line 1137, in call return self.main(args, *kwargs) File "/Users/simon/.local/share/virtualenvs/sqlite-utils-C4Ilevlm/lib/python3.8/site-packages/click/core.py", line 1090, in main sys.exit(e.exit_code) SystemExit: 0

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Misleading progress bar against utf-16-le CSV input 1250495688  
1155771462 https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155771462 https://api.github.com/repos/simonw/sqlite-utils/issues/439 IC_kwDOCGYnMM5E46xG simonw 9599 2022-06-14T22:28:38Z 2022-06-14T22:28:38Z OWNER

Maybe this isn't a CSV field value problem - I tried this patch and didn't seem to hit the new breakpoints: diff diff --git a/sqlite_utils/utils.py b/sqlite_utils/utils.py index d2ccc5f..f1b823a 100644 --- a/sqlite_utils/utils.py +++ b/sqlite_utils/utils.py @@ -204,13 +204,17 @@ def _extra_key_strategy( # DictReader adds a 'None' key with extra row values if None not in row: yield row - elif ignore_extras: + continue + else: + breakpoint() + if ignore_extras: # ignoring row.pop(none) because of this issue: # https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155358637 row.pop(None) # type: ignore yield row elif not extras_key: extras = row.pop(None) # type: ignore + breakpoint() raise RowError( "Row {} contained these extra values: {}".format(row, extras) )

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Misleading progress bar against utf-16-le CSV input 1250495688  
1155769216 https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155769216 https://api.github.com/repos/simonw/sqlite-utils/issues/439 IC_kwDOCGYnMM5E46OA simonw 9599 2022-06-14T22:24:49Z 2022-06-14T22:25:06Z OWNER

I have a hunch that this crash may be caused by a CSV value which is too long, as addressed at the library level in: - #440

But not yet addressed in the CLI tool, see:

  • 444

Either way though, I really don't like that errors like this are swallowed!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Misleading progress bar against utf-16-le CSV input 1250495688  
1155767202 https://github.com/simonw/sqlite-utils/issues/439#issuecomment-1155767202 https://api.github.com/repos/simonw/sqlite-utils/issues/439 IC_kwDOCGYnMM5E45ui simonw 9599 2022-06-14T22:21:10Z 2022-06-14T22:21:10Z OWNER

I can't figure out why that error is being swallowed like that. The most likely culprit was this code:

https://github.com/simonw/sqlite-utils/blob/1b09538bc6c1fda773590f3e600993ef06591041/sqlite_utils/cli.py#L1021-L1043

But I tried changing it like this:

diff diff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py index 86eddfb..ed26fdd 100644 --- a/sqlite_utils/cli.py +++ b/sqlite_utils/cli.py @@ -1023,6 +1023,7 @@ def insert_upsert_implementation( docs, pk=pk, batch_size=batch_size, alter=alter, **extra_kwargs ) except Exception as e: + raise if ( isinstance(e, sqlite3.OperationalError) and e.args And your steps to reproduce still got to 49% and then failed silently.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Misleading progress bar against utf-16-le CSV input 1250495688  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 19.835ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows