html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app
https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864330508,https://api.github.com/repos/simonw/sqlite-utils/issues/279,864330508,MDEyOklzc3VlQ29tbWVudDg2NDMzMDUwOA==,9599,2021-06-19T00:34:24Z,2021-06-19T00:34:24Z,OWNER,"Got this working:
% curl 'https://api.github.com/repos/simonw/datasette/issues' | sqlite-utils memory - 'select id from stdin' ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",924990677,
https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864328927,https://api.github.com/repos/simonw/sqlite-utils/issues/279,864328927,MDEyOklzc3VlQ29tbWVudDg2NDMyODkyNw==,9599,2021-06-19T00:25:08Z,2021-06-19T00:25:17Z,OWNER,"I tried writing this function with type hints, but eventually gave up:
```python
def rows_from_file(
fp: BinaryIO,
format: Optional[Format] = None,
dialect: Optional[Type[csv.Dialect]] = None,
encoding: Optional[str] = None,
) -> Generator[dict, None, None]:
if format == Format.JSON:
decoded = json.load(fp)
if isinstance(decoded, dict):
decoded = [decoded]
if not isinstance(decoded, list):
raise RowsFromFileBadJSON(""JSON must be a list or a dictionary"")
yield from decoded
elif format == Format.CSV:
decoded_fp = io.TextIOWrapper(fp, encoding=encoding or ""utf-8-sig"")
yield from csv.DictReader(decoded_fp)
elif format == Format.TSV:
yield from rows_from_file(
fp, format=Format.CSV, dialect=csv.excel_tab, encoding=encoding
)
elif format is None:
# Detect the format, then call this recursively
buffered = io.BufferedReader(fp, buffer_size=4096)
first_bytes = buffered.peek(2048).strip()
if first_bytes[0] in (b""["", b""{""):
# TODO: Detect newline-JSON
yield from rows_from_file(fp, format=Format.JSON)
else:
dialect = csv.Sniffer().sniff(first_bytes.decode(encoding, ""ignore""))
yield from rows_from_file(
fp, format=Format.CSV, dialect=dialect, encoding=encoding
)
else:
raise RowsFromFileError(""Bad format"")
```
mypy said:
```
sqlite_utils/utils.py:157: error: Argument 1 to ""BufferedReader"" has incompatible type ""BinaryIO""; expected ""RawIOBase""
sqlite_utils/utils.py:163: error: Argument 1 to ""decode"" of ""bytes"" has incompatible type ""Optional[str]""; expected ""str""
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",924990677,
https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864208476,https://api.github.com/repos/simonw/sqlite-utils/issues/279,864208476,MDEyOklzc3VlQ29tbWVudDg2NDIwODQ3Ng==,9599,2021-06-18T18:30:08Z,2021-06-18T23:30:19Z,OWNER,"So maybe this is a function which can either be told the format or, if none is provided, it detects one for itself.
```python
def rows_from_file(fp, format=None):
# ...
yield from rows
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",924990677,
https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864207841,https://api.github.com/repos/simonw/sqlite-utils/issues/279,864207841,MDEyOklzc3VlQ29tbWVudDg2NDIwNzg0MQ==,9599,2021-06-18T18:28:40Z,2021-06-18T18:28:46Z,OWNER,"```python
def detect_format(fp):
# ...
return ""csv"", fp, dialect
# or
return ""json"", fp, parsed_data
# or
return ""json-nl"", fp, docs
```
The mixed return types here are ugly. In all of these cases what we really want is to return a generator of `{...}` objects. So maybe it returns that instead.
```python
def filepointer_to_documents(fp):
# ...
yield from documents
```
I can refactor `sqlite-utils insert` to use this new code too.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",924990677,
https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864206308,https://api.github.com/repos/simonw/sqlite-utils/issues/279,864206308,MDEyOklzc3VlQ29tbWVudDg2NDIwNjMwOA==,9599,2021-06-18T18:25:04Z,2021-06-18T18:25:04Z,OWNER,"Or... since I'm not using a streaming JSON parser at the moment, if I think something is JSON I can load the entire thing into memory to validate it.
I still need to detect newline-delimited JSON. For that I can consume the first line of the input to see if it's a valid JSON object, then maybe sniff the second line too?
This does mean that if the input is a single line of GIANT JSON it will all be consumed into memory at once, but that's going to happen anyway.
So I need a function which, given a file pointer, consumes from it, detects the type, then returns that type AND a file pointer to the beginning of the file again. I can use `io.BufferedReader` for this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",924990677,
https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864129273,https://api.github.com/repos/simonw/sqlite-utils/issues/279,864129273,MDEyOklzc3VlQ29tbWVudDg2NDEyOTI3Mw==,9599,2021-06-18T15:47:47Z,2021-06-18T15:47:47Z,OWNER,"Detecting valid JSON is tricky - just because a stream starts with `[` or `{` doesn't mean the entire stream is valid JSON. You need to parse the entire stream to determine that for sure.
One way to solve this would be with a custom state machine. Another would be to use the `ijson` streaming parser - annoyingly it throws the same exception class for invalid JSON for different reasons, but the `e.args[0]` for that exception includes human-readable text about the error - if it's anything other than `parse error: premature EOF` then it probably means the JSON was invalid.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",924990677,
https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864103005,https://api.github.com/repos/simonw/sqlite-utils/issues/279,864103005,MDEyOklzc3VlQ29tbWVudDg2NDEwMzAwNQ==,9599,2021-06-18T15:04:15Z,2021-06-18T15:04:15Z,OWNER,"To detect JSON, check to see if the stream starts with `[` or `{` - maybe do something more sophisticated than that.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",924990677,