home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

20 rows where author_association = "OWNER", issue = 1250629388 and user = 9599 sorted by updated_at descending

✖
✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • simonw · 20 ✖

issue 1

  • CSV files with too many values in a row cause errors · 20 ✖

author_association 1

  • OWNER · 20 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1155767915 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155767915 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5E455r simonw 9599 2022-06-14T22:22:27Z 2022-06-14T22:22:27Z OWNER

I forgot to add equivalents of extras_key= and ignore_extras= to the CLI tool - will do that in a separate issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1155672675 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155672675 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5E4ipj simonw 9599 2022-06-14T20:19:07Z 2022-06-14T20:19:07Z OWNER

Documentation: https://sqlite-utils.datasette.io/en/latest/python-api.html#reading-rows-from-a-file

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 1,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1155666672 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155666672 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5E4hLw simonw 9599 2022-06-14T20:11:52Z 2022-06-14T20:11:52Z OWNER

I'm going to rename restkey to extras_key for consistency with ignore_extras.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1155389614 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155389614 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5E3diu simonw 9599 2022-06-14T15:54:03Z 2022-06-14T15:54:03Z OWNER

Filed an issue against python/typeshed:

  • https://github.com/python/typeshed/issues/8075
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1155358637 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155358637 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5E3V-t simonw 9599 2022-06-14T15:31:34Z 2022-06-14T15:31:34Z OWNER

Getting this past mypy is really hard!

% mypy sqlite_utils sqlite_utils/utils.py:189: error: No overload variant of "pop" of "MutableMapping" matches argument type "None" sqlite_utils/utils.py:189: note: Possible overload variants: sqlite_utils/utils.py:189: note: def pop(self, key: str) -> str sqlite_utils/utils.py:189: note: def [_T] pop(self, key: str, default: Union[str, _T] = ...) -> Union[str, _T] That's because of this line:

row.pop(key=None)

Which is legit here - we have a dictionary where one of the keys is None and we want to remove that key. But the baked in type is apparently def pop(self, key: str) -> str.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1155350755 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155350755 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5E3UDj simonw 9599 2022-06-14T15:25:18Z 2022-06-14T15:25:18Z OWNER

That broke mypy:

sqlite_utils/utils.py:229: error: Incompatible types in assignment (expression has type "Iterable[Dict[Any, Any]]", variable has type "DictReader[str]")

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1155317293 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155317293 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5E3L4t simonw 9599 2022-06-14T15:04:01Z 2022-06-14T15:04:01Z OWNER

I think that's unavoidable: it looks like csv.Sniffer only works if you feed it a CSV file with an equal number of values in each row, which is understandable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1155310521 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155310521 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5E3KO5 simonw 9599 2022-06-14T14:58:50Z 2022-06-14T14:58:50Z OWNER

Interesting challenge in writing tests for this: if you give csv.Sniffer a short example with an invalid row in it sometimes it picks the wrong delimiter!

id,name\r\n1,Cleo,oops

It decided the delimiter there was e.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1154475454 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154475454 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5Ez-W- simonw 9599 2022-06-13T21:52:03Z 2022-06-13T21:52:03Z OWNER

The exception will be called RowError.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1154474482 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154474482 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5Ez-Hy simonw 9599 2022-06-13T21:50:59Z 2022-06-13T21:51:24Z OWNER

Decision: I'm going to default to raising an exception if a row has too many values in it.

You'll be able to pass ignore_extras=True to ignore those extra values, or pass restkey="the_rest" to stick them in a list in the restkey column.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1154457893 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154457893 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5Ez6El simonw 9599 2022-06-13T21:29:02Z 2022-06-13T21:29:02Z OWNER

Here's the current function signature for rows_from_file():

https://github.com/simonw/sqlite-utils/blob/26e6d2622c57460a24ffdd0128bbaac051d51a5f/sqlite_utils/utils.py#L174-L179

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1154457028 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154457028 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5Ez53E simonw 9599 2022-06-13T21:28:03Z 2022-06-13T21:28:03Z OWNER

Whatever I decide, I can implement it in rows_from_file(), maybe as an optional parameter - then decide how to call it from the sqlite-utils insert CLI (perhaps with a new option there too).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1154456183 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154456183 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5Ez5p3 simonw 9599 2022-06-13T21:26:55Z 2022-06-13T21:26:55Z OWNER

So I need to make a design decision here: what should sqlite-utils do with CSV files that have rows with more values than there are headings?

Some options:

  • Ignore those extra fields entirely - silently drop that data. I'm not keen on this.
  • Throw an error. The library does this already, but the error is incomprehensible - it could turn into a useful, human-readable error instead.
  • Put the data in a JSON list in a column with a known name (None is not a valid column name, so not that). This could be something like _restkey or _values_with_no_heading. This feels like a better option, but I'd need to carefully pick a name for it - and come up with an answer for the question of what to do if the CSV file being important already uses that heading name for something else.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1154454127 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154454127 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5Ez5Jv simonw 9599 2022-06-13T21:24:18Z 2022-06-13T21:24:18Z OWNER

That weird behaviour is documented here: https://docs.python.org/3/library/csv.html#csv.DictReader

If a row has more fields than fieldnames, the remaining data is put in a list and stored with the fieldname specified by restkey (which defaults to None). If a non-blank row has fewer fields than fieldnames, the missing values are filled-in with the value of restval (which defaults to None).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1154453319 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154453319 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5Ez49H simonw 9599 2022-06-13T21:23:16Z 2022-06-13T21:23:16Z OWNER

Aha! I think I see what's happening here. Here's what DictReader does if one of the lines has too many items in it:

```pycon

import csv, io list(csv.DictReader(io.StringIO("id,name\n1,Cleo,nohead\n2,Barry"))) [{'id': '1', 'name': 'Cleo', None: ['nohead']}, {'id': '2', 'name': 'Barry'}] `` See how that row with too many items gets this:[{'id': '1', 'name': 'Cleo', None: ['nohead']}`

That's a None for the key and (weirdly) a list containing the single item for the value!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1154449442 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154449442 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5Ez4Ai simonw 9599 2022-06-13T21:18:26Z 2022-06-13T21:20:12Z OWNER

Here are full steps to replicate the bug: python from urllib.request import urlopen import sqlite_utils db = sqlite_utils.Database(memory=True) with urlopen("https://artsdatabanken.no/Fab2018/api/export/csv") as fab: reader, other = sqlite_utils.utils.rows_from_file(fab, encoding="utf-16le") db["fab2018"].insert_all(reader, pk="Id")

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1154396400 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154396400 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5EzrDw simonw 9599 2022-06-13T20:28:25Z 2022-06-13T20:28:25Z OWNER

Fixing that key thing (to ignore any key that is None) revealed a new bug:

``` File ~/Dropbox/Development/sqlite-utils/sqlite_utils/utils.py:376, in hash_record(record, keys) 373 if keys is not None: 374 to_hash = {key: record[key] for key in keys} 375 return hashlib.sha1( --> 376 json.dumps(to_hash, separators=(",", ":"), sort_keys=True, default=repr).encode( 377 "utf8" 378 ) 379 ).hexdigest()

File ~/.pyenv/versions/3.8.2/lib/python3.8/json/init.py:234, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, kw) 232 if cls is None: 233 cls = JSONEncoder --> 234 return cls( 235 skipkeys=skipkeys, ensure_ascii=ensure_ascii, 236 check_circular=check_circular, allow_nan=allow_nan, indent=indent, 237 separators=separators, default=default, sort_keys=sort_keys, 238 kw).encode(obj)

File ~/.pyenv/versions/3.8.2/lib/python3.8/json/encoder.py:199, in JSONEncoder.encode(self, o) 195 return encode_basestring(o) 196 # This doesn't pass the iterator directly to ''.join() because the 197 # exceptions aren't as detailed. The list call should be roughly 198 # equivalent to the PySequence_Fast that ''.join() would do. --> 199 chunks = self.iterencode(o, _one_shot=True) 200 if not isinstance(chunks, (list, tuple)): 201 chunks = list(chunks)

File ~/.pyenv/versions/3.8.2/lib/python3.8/json/encoder.py:257, in JSONEncoder.iterencode(self, o, _one_shot) 252 else: 253 _iterencode = _make_iterencode( 254 markers, self.default, _encoder, self.indent, floatstr, 255 self.key_separator, self.item_separator, self.sort_keys, 256 self.skipkeys, _one_shot) --> 257 return _iterencode(o, 0)

TypeError: '<' not supported between instances of 'NoneType' and 'str' ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1154387591 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154387591 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5Ezo6H simonw 9599 2022-06-13T20:17:51Z 2022-06-13T20:17:51Z OWNER

I don't understand why that works but calling insert_all() does not.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1154386795 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154386795 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5Ezotr simonw 9599 2022-06-13T20:16:53Z 2022-06-13T20:16:53Z OWNER

Steps to demonstrate that sqlite-utils insert is not affected:

bash curl -o artsdatabanken.csv https://artsdatabanken.no/Fab2018/api/export/csv sqlite-utils insert arts.db artsdatabanken artsdatabanken.csv --sniff --csv --encoding utf-16le

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  
1154385916 https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1154385916 https://api.github.com/repos/simonw/sqlite-utils/issues/440 IC_kwDOCGYnMM5Ezof8 simonw 9599 2022-06-13T20:15:49Z 2022-06-13T20:15:49Z OWNER

rows_from_file() isn't part of the documented API but maybe it should be!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
CSV files with too many values in a row cause errors 1250629388  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 532.662ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows