{"html_url": "https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864330508", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/279", "id": 864330508, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDMzMDUwOA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-19T00:34:24Z", "updated_at": "2021-06-19T00:34:24Z", "author_association": "OWNER", "body": "Got this working:\r\n\r\n % curl 'https://api.github.com/repos/simonw/datasette/issues' | sqlite-utils memory - 'select id from stdin' ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 924990677, "label": "sqlite-utils memory should handle TSV and JSON in addition to CSV"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864328927", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/279", "id": 864328927, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDMyODkyNw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-19T00:25:08Z", "updated_at": "2021-06-19T00:25:17Z", "author_association": "OWNER", "body": "I tried writing this function with type hints, but eventually gave up:\r\n```python\r\ndef rows_from_file(\r\n fp: BinaryIO,\r\n format: Optional[Format] = None,\r\n dialect: Optional[Type[csv.Dialect]] = None,\r\n encoding: Optional[str] = None,\r\n) -> Generator[dict, None, None]:\r\n if format == Format.JSON:\r\n decoded = json.load(fp)\r\n if isinstance(decoded, dict):\r\n decoded = [decoded]\r\n if not isinstance(decoded, list):\r\n raise RowsFromFileBadJSON(\"JSON must be a list or a dictionary\")\r\n yield from decoded\r\n elif format == Format.CSV:\r\n decoded_fp = io.TextIOWrapper(fp, encoding=encoding or \"utf-8-sig\")\r\n yield from csv.DictReader(decoded_fp)\r\n elif format == Format.TSV:\r\n yield from rows_from_file(\r\n fp, format=Format.CSV, dialect=csv.excel_tab, encoding=encoding\r\n )\r\n elif format is None:\r\n # Detect the format, then call this recursively\r\n buffered = io.BufferedReader(fp, buffer_size=4096)\r\n first_bytes = buffered.peek(2048).strip()\r\n if first_bytes[0] in (b\"[\", b\"{\"):\r\n # TODO: Detect newline-JSON\r\n yield from rows_from_file(fp, format=Format.JSON)\r\n else:\r\n dialect = csv.Sniffer().sniff(first_bytes.decode(encoding, \"ignore\"))\r\n yield from rows_from_file(\r\n fp, format=Format.CSV, dialect=dialect, encoding=encoding\r\n )\r\n else:\r\n raise RowsFromFileError(\"Bad format\")\r\n```\r\nmypy said:\r\n```\r\nsqlite_utils/utils.py:157: error: Argument 1 to \"BufferedReader\" has incompatible type \"BinaryIO\"; expected \"RawIOBase\"\r\nsqlite_utils/utils.py:163: error: Argument 1 to \"decode\" of \"bytes\" has incompatible type \"Optional[str]\"; expected \"str\"\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 924990677, "label": "sqlite-utils memory should handle TSV and JSON in addition to CSV"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864208476", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/279", "id": 864208476, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDIwODQ3Ng==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-18T18:30:08Z", "updated_at": "2021-06-18T23:30:19Z", "author_association": "OWNER", "body": "So maybe this is a function which can either be told the format or, if none is provided, it detects one for itself.\r\n```python\r\ndef rows_from_file(fp, format=None):\r\n # ...\r\n yield from rows\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 924990677, "label": "sqlite-utils memory should handle TSV and JSON in addition to CSV"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864207841", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/279", "id": 864207841, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDIwNzg0MQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-18T18:28:40Z", "updated_at": "2021-06-18T18:28:46Z", "author_association": "OWNER", "body": "```python\r\ndef detect_format(fp):\r\n # ...\r\n return \"csv\", fp, dialect\r\n # or\r\n return \"json\", fp, parsed_data\r\n # or\r\n return \"json-nl\", fp, docs\r\n```\r\nThe mixed return types here are ugly. In all of these cases what we really want is to return a generator of `{...}` objects. So maybe it returns that instead.\r\n```python\r\ndef filepointer_to_documents(fp):\r\n # ...\r\n yield from documents\r\n```\r\nI can refactor `sqlite-utils insert` to use this new code too.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 924990677, "label": "sqlite-utils memory should handle TSV and JSON in addition to CSV"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864206308", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/279", "id": 864206308, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDIwNjMwOA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-18T18:25:04Z", "updated_at": "2021-06-18T18:25:04Z", "author_association": "OWNER", "body": "Or... since I'm not using a streaming JSON parser at the moment, if I think something is JSON I can load the entire thing into memory to validate it.\r\n\r\nI still need to detect newline-delimited JSON. For that I can consume the first line of the input to see if it's a valid JSON object, then maybe sniff the second line too?\r\n\r\nThis does mean that if the input is a single line of GIANT JSON it will all be consumed into memory at once, but that's going to happen anyway.\r\n\r\nSo I need a function which, given a file pointer, consumes from it, detects the type, then returns that type AND a file pointer to the beginning of the file again. I can use `io.BufferedReader` for this.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 924990677, "label": "sqlite-utils memory should handle TSV and JSON in addition to CSV"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864129273", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/279", "id": 864129273, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDEyOTI3Mw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-18T15:47:47Z", "updated_at": "2021-06-18T15:47:47Z", "author_association": "OWNER", "body": "Detecting valid JSON is tricky - just because a stream starts with `[` or `{` doesn't mean the entire stream is valid JSON. You need to parse the entire stream to determine that for sure.\r\n\r\nOne way to solve this would be with a custom state machine. Another would be to use the `ijson` streaming parser - annoyingly it throws the same exception class for invalid JSON for different reasons, but the `e.args[0]` for that exception includes human-readable text about the error - if it's anything other than `parse error: premature EOF` then it probably means the JSON was invalid.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 924990677, "label": "sqlite-utils memory should handle TSV and JSON in addition to CSV"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864103005", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/279", "id": 864103005, "node_id": "MDEyOklzc3VlQ29tbWVudDg2NDEwMzAwNQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-06-18T15:04:15Z", "updated_at": "2021-06-18T15:04:15Z", "author_association": "OWNER", "body": "To detect JSON, check to see if the stream starts with `[` or `{` - maybe do something more sophisticated than that.\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 924990677, "label": "sqlite-utils memory should handle TSV and JSON in addition to CSV"}, "performed_via_github_app": null}