github

This data as json, CSV

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	issue	performed_via_github_app
https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864208476	https://api.github.com/repos/simonw/sqlite-utils/issues/279	864208476	MDEyOklzc3VlQ29tbWVudDg2NDIwODQ3Ng==	9599	2021-06-18T18:30:08Z	2021-06-18T23:30:19Z	OWNER	So maybe this is a function which can either be told the format or, if none is provided, it detects one for itself. ```python def rows_from_file(fp, format=None): # ... yield from rows ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	924990677
https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864207841	https://api.github.com/repos/simonw/sqlite-utils/issues/279	864207841	MDEyOklzc3VlQ29tbWVudDg2NDIwNzg0MQ==	9599	2021-06-18T18:28:40Z	2021-06-18T18:28:46Z	OWNER	```python def detect_format(fp): # ... return "csv", fp, dialect # or return "json", fp, parsed_data # or return "json-nl", fp, docs ``` The mixed return types here are ugly. In all of these cases what we really want is to return a generator of `{...}` objects. So maybe it returns that instead. ```python def filepointer_to_documents(fp): # ... yield from documents ``` I can refactor `sqlite-utils insert` to use this new code too.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	924990677
https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864206308	https://api.github.com/repos/simonw/sqlite-utils/issues/279	864206308	MDEyOklzc3VlQ29tbWVudDg2NDIwNjMwOA==	9599	2021-06-18T18:25:04Z	2021-06-18T18:25:04Z	OWNER	Or... since I'm not using a streaming JSON parser at the moment, if I think something is JSON I can load the entire thing into memory to validate it. I still need to detect newline-delimited JSON. For that I can consume the first line of the input to see if it's a valid JSON object, then maybe sniff the second line too? This does mean that if the input is a single line of GIANT JSON it will all be consumed into memory at once, but that's going to happen anyway. So I need a function which, given a file pointer, consumes from it, detects the type, then returns that type AND a file pointer to the beginning of the file again. I can use `io.BufferedReader` for this.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	924990677
https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864129273	https://api.github.com/repos/simonw/sqlite-utils/issues/279	864129273	MDEyOklzc3VlQ29tbWVudDg2NDEyOTI3Mw==	9599	2021-06-18T15:47:47Z	2021-06-18T15:47:47Z	OWNER	Detecting valid JSON is tricky - just because a stream starts with `[` or `{` doesn't mean the entire stream is valid JSON. You need to parse the entire stream to determine that for sure. One way to solve this would be with a custom state machine. Another would be to use the `ijson` streaming parser - annoyingly it throws the same exception class for invalid JSON for different reasons, but the `e.args[0]` for that exception includes human-readable text about the error - if it's anything other than `parse error: premature EOF` then it probably means the JSON was invalid.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	924990677
https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864103005	https://api.github.com/repos/simonw/sqlite-utils/issues/279	864103005	MDEyOklzc3VlQ29tbWVudDg2NDEwMzAwNQ==	9599	2021-06-18T15:04:15Z	2021-06-18T15:04:15Z	OWNER	To detect JSON, check to see if the stream starts with `[` or `{` - maybe do something more sophisticated than that.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	924990677