github

This data as json, CSV

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	issue
https://github.com/simonw/sqlite-utils/issues/227#issuecomment-778854808	https://api.github.com/repos/simonw/sqlite-utils/issues/227	778854808	MDEyOklzc3VlQ29tbWVudDc3ODg1NDgwOA==	9599	2021-02-14T22:46:54Z	2021-02-14T22:46:54Z	OWNER	Fix is released in 3.5.	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	807174161
https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778851721	https://api.github.com/repos/simonw/sqlite-utils/issues/228	778851721	MDEyOklzc3VlQ29tbWVudDc3ODg1MTcyMQ==	9599	2021-02-14T22:23:46Z	2021-02-14T22:23:46Z	OWNER	I called this `--no-headers` for consistency with the existing output option: https://github.com/simonw/sqlite-utils/blob/427dace184c7da57f4a04df07b1e84cdae3261e8/sqlite_utils/cli.py#L61-L64	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	807437089
https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778849394	https://api.github.com/repos/simonw/sqlite-utils/issues/228	778849394	MDEyOklzc3VlQ29tbWVudDc3ODg0OTM5NA==	9599	2021-02-14T22:06:53Z	2021-02-14T22:06:53Z	OWNER	For the moment I think just adding `--no-header` - which causes column names "unknown1,unknown2,..." to be used - should be enough. Users can import with that option, then use `sqlite-utils transform --rename` to rename them.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	807437089
https://github.com/simonw/sqlite-utils/issues/229#issuecomment-778844016	https://api.github.com/repos/simonw/sqlite-utils/issues/229	778844016	MDEyOklzc3VlQ29tbWVudDc3ODg0NDAxNg==	9599	2021-02-14T21:22:45Z	2021-02-14T21:22:45Z	OWNER	I'm going to use this pattern from https://stackoverflow.com/a/15063941 ```python import sys import csv maxInt = sys.maxsize while True: # decrease the maxInt value by factor 10 # as long as the OverflowError occurs. try: csv.field_size_limit(maxInt) break except OverflowError: maxInt = int(maxInt/10) ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	807817197
https://github.com/simonw/sqlite-utils/issues/229#issuecomment-778843503	https://api.github.com/repos/simonw/sqlite-utils/issues/229	778843503	MDEyOklzc3VlQ29tbWVudDc3ODg0MzUwMw==	9599	2021-02-14T21:18:51Z	2021-02-14T21:18:51Z	OWNER	I want to set this to the maximum allowed limit, which seems to be surprisingly hard! That StackOverflow thread is full of ideas for that, many of them involving `ctypes`. I'm a bit loathe to add a dependency on `ctypes` though - even though it's in the Python standard library I worry that it might not be available on some architectures.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	807817197
https://github.com/simonw/sqlite-utils/issues/229#issuecomment-778843362	https://api.github.com/repos/simonw/sqlite-utils/issues/229	778843362	MDEyOklzc3VlQ29tbWVudDc3ODg0MzM2Mg==	9599	2021-02-14T21:17:53Z	2021-02-14T21:17:53Z	OWNER	Same issue as #227.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	807817197
https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778811746	https://api.github.com/repos/simonw/sqlite-utils/issues/228	778811746	MDEyOklzc3VlQ29tbWVudDc3ODgxMTc0Ng==	9599	2021-02-14T17:39:30Z	2021-02-14T21:16:54Z	OWNER	I'm going to detach this from the #131 column types idea. The three things I need to handle here are: - The CSV file doesn't have a header row at all, so I need to specify what the column names should be - The CSV file DOES have a header row but I want to ignore it and use alternative column names - The CSV doesn't have a header row at all and I want to automatically use `unknown1,unknown2...` so I can start exploring it as quickly as possible. Here's a potential design that covers the first two: `--replace-header="foo,bar,baz"` - ignore whatever is in the first row and pretend it was this instead `--add-header="foo,bar,baz"` - add a first row with these details, to use as the header It doesn't cover the "give me unknown column names" case though.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	807437089
https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778843086	https://api.github.com/repos/simonw/sqlite-utils/issues/228	778843086	MDEyOklzc3VlQ29tbWVudDc3ODg0MzA4Ng==	9599	2021-02-14T21:15:43Z	2021-02-14T21:15:43Z	OWNER	I'm not convinced the `.has_header()` rules are useful for the kind of CSV files I work with: https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Lib/csv.py#L383 ```python def has_header(self, sample): # Creates a dictionary of types of data in each column. If any # column is of a single type (say, integers), except for the first # row, then the first row is presumed to be labels. If the type # can't be determined, it is assumed to be a string in which case # the length of the string is the determining factor: if all of the # rows except for the first are the same length, it's a header. # Finally, a 'vote' is taken at the end for each column, adding or # subtracting from the likelihood of the first row being a header. ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	807437089
https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778842982	https://api.github.com/repos/simonw/sqlite-utils/issues/228	778842982	MDEyOklzc3VlQ29tbWVudDc3ODg0Mjk4Mg==	9599	2021-02-14T21:15:11Z	2021-02-14T21:15:11Z	OWNER	Implementation tip: I have code that reads the first row and uses it as headers here: https://github.com/simonw/sqlite-utils/blob/8f042ae1fd323995d966a94e8e6df85cc843b938/sqlite_utils/cli.py#L689-L691 So If I want to use `unknown1,unknown2...` I can do that by reading the first row, counting the number of columns, generating headers based on that range and then continuing to build that generator (maybe with `itertools.chain()` to replay the record we already read).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	807437089
https://github.com/simonw/sqlite-utils/issues/227#issuecomment-778841704	https://api.github.com/repos/simonw/sqlite-utils/issues/227	778841704	MDEyOklzc3VlQ29tbWVudDc3ODg0MTcwNA==	9599	2021-02-14T21:05:20Z	2021-02-14T21:05:20Z	OWNER	This has also been reported in #229.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	807174161
https://github.com/simonw/sqlite-utils/pull/225#issuecomment-778841547	https://api.github.com/repos/simonw/sqlite-utils/issues/225	778841547	MDEyOklzc3VlQ29tbWVudDc3ODg0MTU0Nw==	9599	2021-02-14T21:04:13Z	2021-02-14T21:04:13Z	OWNER	I added a test and fixed this in #234 - thanks for the fix.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	797159961
https://github.com/simonw/sqlite-utils/issues/234#issuecomment-778841278	https://api.github.com/repos/simonw/sqlite-utils/issues/234	778841278	MDEyOklzc3VlQ29tbWVudDc3ODg0MTI3OA==	9599	2021-02-14T21:02:11Z	2021-02-14T21:02:11Z	OWNER	I managed to replicate this in a test: ```python def test_insert_all_with_extra_columns_in_later_chunks(fresh_db): chunk = [ {"record": "Record 1"}, {"record": "Record 2"}, {"record": "Record 3"}, {"record": "Record 4", "extra": 1}, ] fresh_db["t"].insert_all(chunk, batch_size=2, alter=True) assert list(fresh_db["t"].rows) == [ {"record": "Record 1", "extra": None}, {"record": "Record 2", "extra": None}, {"record": "Record 3", "extra": None}, {"record": "Record 4", "extra": 1}, ] ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	808046597
https://github.com/simonw/sqlite-utils/pull/225#issuecomment-778834504	https://api.github.com/repos/simonw/sqlite-utils/issues/225	778834504	MDEyOklzc3VlQ29tbWVudDc3ODgzNDUwNA==	9599	2021-02-14T20:09:30Z	2021-02-14T20:09:30Z	OWNER	Thanks for this. I'm going to try and get the test suite to run in Windows on GitHub Actions.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	797159961
https://github.com/simonw/sqlite-utils/issues/231#issuecomment-778829456	https://api.github.com/repos/simonw/sqlite-utils/issues/231	778829456	MDEyOklzc3VlQ29tbWVudDc3ODgyOTQ1Ng==	9599	2021-02-14T19:37:52Z	2021-02-14T19:37:52Z	OWNER	I'm going to add `limit` and `offset` to the following methods: - `rows_where()` - `search_sql()` - `search()`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	808028757
https://github.com/simonw/sqlite-utils/issues/231#issuecomment-778828758	https://api.github.com/repos/simonw/sqlite-utils/issues/231	778828758	MDEyOklzc3VlQ29tbWVudDc3ODgyODc1OA==	9599	2021-02-14T19:33:14Z	2021-02-14T19:33:14Z	OWNER	The `limit=` parameter is currently only available on the `.search()` method - it would make sense to add this to other methods as well.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	808028757
https://github.com/simonw/sqlite-utils/pull/224#issuecomment-778828495	https://api.github.com/repos/simonw/sqlite-utils/issues/224	778828495	MDEyOklzc3VlQ29tbWVudDc3ODgyODQ5NQ==	9599	2021-02-14T19:31:06Z	2021-02-14T19:31:06Z	OWNER	I'm going to add a `offset=` parameter to support this case. Thanks for the suggestion!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	792297010
https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778827570	https://api.github.com/repos/simonw/sqlite-utils/issues/230	778827570	MDEyOklzc3VlQ29tbWVudDc3ODgyNzU3MA==	9599	2021-02-14T19:24:20Z	2021-02-14T19:24:20Z	OWNER	Here's the implementation in Python: https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Lib/csv.py#L204-L225	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	808008305
https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778824361	https://api.github.com/repos/simonw/sqlite-utils/issues/230	778824361	MDEyOklzc3VlQ29tbWVudDc3ODgyNDM2MQ==	9599	2021-02-14T18:59:22Z	2021-02-14T18:59:22Z	OWNER	I think I've got it. I can use `io.BufferedReader()` to get an object I can run `.peek(2048)` on, then wrap THAT in `io.TextIOWrapper`: ```python encoding = encoding or "utf-8" buffered = io.BufferedReader(json_file, buffer_size=4096) decoded = io.TextIOWrapper(buffered, encoding=encoding, line_buffering=True) if pk and len(pk) == 1: pk = pk[0] if csv or tsv: if sniff: # Read first 2048 bytes and use that to detect first_bytes = buffered.peek(2048) print('first_bytes', first_bytes) ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	808008305
https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778821403	https://api.github.com/repos/simonw/sqlite-utils/issues/230	778821403	MDEyOklzc3VlQ29tbWVudDc3ODgyMTQwMw==	9599	2021-02-14T18:38:16Z	2021-02-14T18:38:16Z	OWNER	There are two code paths here that matter: - For a regular file, can read the first 2048 bytes, then `.seek(0)` before continuing. That's easy. - `stdin` is harder. I need to read and buffer the first 2048 bytes, then pass an object to `csv.reader()` which will replay that chunk and then play the rest of stdin. I'm a bit stuck on the second one. Ideally I could use something like `itertools.chain()` but I can't find an alternative for file-like objects.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	808008305
https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778818639	https://api.github.com/repos/simonw/sqlite-utils/issues/230	778818639	MDEyOklzc3VlQ29tbWVudDc3ODgxODYzOQ==	9599	2021-02-14T18:22:38Z	2021-02-14T18:22:38Z	OWNER	Maybe I shouldn't be using `StreamReader` at all - https://www.python.org/dev/peps/pep-0400/ suggests that it should be deprecated in favour of `io.TextIOWrapper`. I'm using `StreamReader` due to this line: https://github.com/simonw/sqlite-utils/blob/726219c3503e77440975cd15b74d006639feb0f8/sqlite_utils/cli.py#L667-L668	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	808008305
https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778817494	https://api.github.com/repos/simonw/sqlite-utils/issues/230	778817494	MDEyOklzc3VlQ29tbWVudDc3ODgxNzQ5NA==	9599	2021-02-14T18:16:06Z	2021-02-14T18:16:06Z	OWNER	Types involved: ``` (Pdb) type(json_file.raw) <class '_io.FileIO'> (Pdb) type(json_file) <class 'encodings.utf_8.StreamReader'> ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	808008305
https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778816333	https://api.github.com/repos/simonw/sqlite-utils/issues/230	778816333	MDEyOklzc3VlQ29tbWVudDc3ODgxNjMzMw==	9599	2021-02-14T18:08:44Z	2021-02-14T18:08:44Z	OWNER	No, you can't `.seek(0)` on stdin: ``` File "/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/cli.py", line 678, in insert_upsert_implementation json_file.raw.seek(0) OSError: [Errno 29] Illegal seek ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	808008305
https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778815740	https://api.github.com/repos/simonw/sqlite-utils/issues/230	778815740	MDEyOklzc3VlQ29tbWVudDc3ODgxNTc0MA==	9599	2021-02-14T18:05:03Z	2021-02-14T18:05:03Z	OWNER	The challenge here is how to read the first 2048 bytes and then reset the incoming file. The Python docs example looks like this: ```python with open('example.csv', newline='') as csvfile: dialect = csv.Sniffer().sniff(csvfile.read(1024)) csvfile.seek(0) reader = csv.reader(csvfile, dialect) ``` Here's the relevant code in `sqlite-utils`: https://github.com/simonw/sqlite-utils/blob/726219c3503e77440975cd15b74d006639feb0f8/sqlite_utils/cli.py#L671-L679 The challenge is going to be having the `--sniff` option work with the progress bar. Here's how `file_progress()` works: https://github.com/simonw/sqlite-utils/blob/726219c3503e77440975cd15b74d006639feb0f8/sqlite_utils/utils.py#L106-L113 If `file.raw` is `stdin` can I do the equivalent of `csvfile.seek(0)` on it?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	808008305
https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778812684	https://api.github.com/repos/simonw/sqlite-utils/issues/230	778812684	MDEyOklzc3VlQ29tbWVudDc3ODgxMjY4NA==	9599	2021-02-14T17:45:16Z	2021-02-14T17:45:16Z	OWNER	Running this could take any CSV (or TSV) file and automatically detect the delimiter. If no header row is detected it could add `unknown1,unknown2` headers: sqlite-utils insert db.db data file.csv --sniff (Using `--sniff` would imply `--csv`) This could be called `--sniffer` instead but I like `--sniff` better.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	808008305
https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778812050	https://api.github.com/repos/simonw/sqlite-utils/issues/228	778812050	MDEyOklzc3VlQ29tbWVudDc3ODgxMjA1MA==	9599	2021-02-14T17:41:30Z	2021-02-14T17:41:30Z	OWNER	I just spotted that `csv.Sniffer` in the Python standard library has a `.has_header(sample)` method which detects if the first row appears to be a header or not, which is interesting. https://docs.python.org/3/library/csv.html#csv.Sniffer	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	807437089
https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778811934	https://api.github.com/repos/simonw/sqlite-utils/issues/228	778811934	MDEyOklzc3VlQ29tbWVudDc3ODgxMTkzNA==	9599	2021-02-14T17:40:48Z	2021-02-14T17:40:48Z	OWNER	Another pattern that might be useful is to generate a header that is just "unknown1,unknown2,unknown3" for each of the columns in the rest of the file. This makes it easy to e.g. facet-explore within Datasette to figure out the correct names, then use `sqlite-utils transform --rename` to rename the columns. I needed to do that for the https://bl.iro.bl.uk/work/ns/3037474a-761c-456d-a00c-9ef3c6773f4c example.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	807437089

github

Custom SQL query returning 26 rows (hide)

Query parameters