html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app https://github.com/dogsheep/google-takeout-to-sqlite/issues/4#issuecomment-790934616,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/4,790934616,MDEyOklzc3VlQ29tbWVudDc5MDkzNDYxNg==,203343,2021-03-04T20:54:44Z,2021-03-04T20:54:44Z,NONE,"Sorry for the delay, I got sidetracked after class last night. I am getting the following error: ``` /content# google-takeout-to-sqlite mbox takeout.db Takeout/Mail/gmail.mbox Usage: google-takeout-to-sqlite [OPTIONS] COMMAND [ARGS]...Try 'google-takeout-to-sqlite --help' for help. Error: No such command 'mbox'. ``` On the box, I installed with pip after cloning: https://github.com/UtahDave/google-takeout-to-sqlite.git","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",778380836, https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790695126,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790695126,MDEyOklzc3VlQ29tbWVudDc5MDY5NTEyNg==,9599,2021-03-04T15:20:42Z,2021-03-04T15:20:42Z,MEMBER,"I'm not sure why but my most recent import, when displayed in Datasette, looks like this: Sorting by `id` in the opposite order gives me the data I would expect - so it looks like a bunch of null/blank messages are being imported at some point and showing up first due to ID ordering.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401, https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790693674,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790693674,MDEyOklzc3VlQ29tbWVudDc5MDY5MzY3NA==,9599,2021-03-04T15:18:36Z,2021-03-04T15:18:36Z,MEMBER,"I imported my 10GB mbox with 750,000 emails in it, ran this tool (with a hacked fix for the blob column problem) - and now a search that returns 92 results takes 25.37ms! This is fantastic.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401, https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790669767,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790669767,MDEyOklzc3VlQ29tbWVudDc5MDY2OTc2Nw==,9599,2021-03-04T14:46:06Z,2021-03-04T14:46:06Z,MEMBER,"Solution could be to pre-process that string by splitting on `(` and dropping everything afterwards, assuming that the `(...)` bit isn't necessary for correctly parsing the date.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401, https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790668263,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790668263,MDEyOklzc3VlQ29tbWVudDc5MDY2ODI2Mw==,9599,2021-03-04T14:43:58Z,2021-03-04T14:43:58Z,MEMBER,"I added this code to output a message ID on errors: ```diff print(""Errors: {}"".format(num_errors)) print(traceback.format_exc()) + print(""Message-Id: {}"".format(email.get(""Message-Id"", ""None""))) continue ``` Having found a message ID that had an error, I ran this command to see the context: rg --text --context 20 '44F289B0.000001.02100@SCHWARZE-DWFXMI' ~/gmail.mbox This was for the following error: ``` File ""/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py"", line 102, in get_mbox message[""date""] = get_message_date(email.get(""Date""), email.get_from()) File ""/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py"", line 178, in get_message_date datetime_tuple = email.utils.parsedate_tz(mail_date) File ""/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py"", line 50, in parsedate_tz res = _parsedate_tz(data) File ""/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py"", line 69, in _parsedate_tz data = data.split() AttributeError: 'Header' object has no attribute 'split' ``` Here's what I spotted in the `ripgrep` output: ``` 177133570:Message-Id: <44F289B0.000001.02100@SCHWARZE-DWFXMI> 177133571-Date: Mon, 28 Aug 2006 08:14:08 +0200 (Westeurop�ische Sommerzeit) 177133572-X-Mailer: IncrediMail (5002253) ``` So it could it be that `_parsedate_tz` is having trouble with that `Mon, 28 Aug 2006 08:14:08 +0200 (Westeurop�ische Sommerzeit)` string.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401, https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790391711,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790391711,MDEyOklzc3VlQ29tbWVudDc5MDM5MTcxMQ==,306240,2021-03-04T07:36:24Z,2021-03-04T07:36:24Z,NONE,"> Looks like you're doing this: > > ```python > elif message.get_content_type() == ""text/plain"": > body = message.get_payload(decode=True) > ``` > > So presumably that decodes to a unicode string? > > I imagine the reason the column is a `BLOB` for me is that `sqlite-utils` determines the column type based on the first batch of items - https://github.com/simonw/sqlite-utils/blob/09c3386f55f766b135b6a1c00295646c4ae29bec/sqlite_utils/db.py#L1927-L1928 - and I got unlucky and had something in my first batch that wasn't a unicode string. Ah, that's good to know. I think explicitly creating the tables will be a great improvement. I'll add that. Also, I noticed after I opened this PR that the `message.get_payload()` is being deprecated in favor of `message.get_content()` or something like that. I'll see if that handles the decoding better, too. Thanks for the feedback. I should have time tomorrow to put together some improvements.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401, https://github.com/dogsheep/google-takeout-to-sqlite/issues/6#issuecomment-790384087,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/6,790384087,MDEyOklzc3VlQ29tbWVudDc5MDM4NDA4Nw==,9599,2021-03-04T07:22:51Z,2021-03-04T07:22:51Z,MEMBER,#3 also mentions the conflicting version with other tools.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",821841046, https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790380839,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790380839,MDEyOklzc3VlQ29tbWVudDc5MDM4MDgzOQ==,9599,2021-03-04T07:17:05Z,2021-03-04T07:17:05Z,MEMBER,"Looks like you're doing this: ```python elif message.get_content_type() == ""text/plain"": body = message.get_payload(decode=True) ``` So presumably that decodes to a unicode string? I imagine the reason the column is a `BLOB` for me is that `sqlite-utils` determines the column type based on the first batch of items - https://github.com/simonw/sqlite-utils/blob/09c3386f55f766b135b6a1c00295646c4ae29bec/sqlite_utils/db.py#L1927-L1928 - and I got unlucky and had something in my first batch that wasn't a unicode string. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401, https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790379629,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790379629,MDEyOklzc3VlQ29tbWVudDc5MDM3OTYyOQ==,9599,2021-03-04T07:14:41Z,2021-03-04T07:14:41Z,MEMBER,"Confirmed: removing the `len()` call does not speed things up, so it's reading through the entire file for some other purpose too.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401, https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790378658,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790378658,MDEyOklzc3VlQ29tbWVudDc5MDM3ODY1OA==,9599,2021-03-04T07:12:48Z,2021-03-04T07:12:48Z,MEMBER,"It looks like the `body` is being loaded into a BLOB column - so in Datasette default it looks like this: If I `datasette install datasette-render-binary` and then try again I get this: It would be great if we could store the `body` as unicode text instead. May have to do something clever to decode it based on some kind of charset header?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401, https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790373024,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790373024,MDEyOklzc3VlQ29tbWVudDc5MDM3MzAyNA==,9599,2021-03-04T07:01:58Z,2021-03-04T07:04:06Z,MEMBER,"I got 9 warnings that look like this: ``` Errors: 1 Traceback (most recent call last): File ""/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py"", line 103, in get_mbox message[""date""] = get_message_date(email.get(""Date""), email.get_from()) File ""/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py"", line 167, in get_message_date datetime_tuple = email.utils.parsedate_tz(mail_date) File ""/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py"", line 50, in parsedate_tz res = _parsedate_tz(data) File ""/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py"", line 69, in _parsedate_tz data = data.split() AttributeError: 'Header' object has no attribute 'split' ``` It would be useful if those warnings told me the message ID (or similar) of the affected message so I could grep for it in the `mbox` and see what was going on. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401, https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790372621,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790372621,MDEyOklzc3VlQ29tbWVudDc5MDM3MjYyMQ==,9599,2021-03-04T07:01:18Z,2021-03-04T07:01:18Z,MEMBER,"I'm not sure if it would work, but there is an alternative pattern for showing a progress bar against a really large file that I've used in `healthkit-to-sqlite` - you set the progress bar size to the size of the file in bytes, then update a counter as you read the file. https://github.com/dogsheep/healthkit-to-sqlite/blob/3eb2b06bfe3b4faaf10e9cf9dfcb28e3d16c14ff/healthkit_to_sqlite/cli.py#L24-L57 and https://github.com/dogsheep/healthkit-to-sqlite/blob/3eb2b06bfe3b4faaf10e9cf9dfcb28e3d16c14ff/healthkit_to_sqlite/utils.py#L4-L19 (the `progress_callback()` bit) is where that happens. It can be a bit of a convoluted pattern, and I'm not at all sure it would work for `mbox` files since it looks like that library has other reasons it needs to do a file scan rather than streaming it through one chunk of bytes at a time. So I imagine this would not work here.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401, https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790370485,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790370485,MDEyOklzc3VlQ29tbWVudDc5MDM3MDQ4NQ==,9599,2021-03-04T06:57:25Z,2021-03-04T06:57:48Z,MEMBER,"The command takes quite a while to start running, presumably because this line causes it to have to scan the WHOLE file in order to generate a count: https://github.com/dogsheep/google-takeout-to-sqlite/blob/a3de045eba0fae4b309da21aa3119102b0efc576/google_takeout_to_sqlite/utils.py#L66-L67 I'm fine with waiting though. It's not like this is a command people run every day - and without that count we can't show a progress bar, which seems pretty important for a process that takes this long.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401, https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790369076,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790369076,MDEyOklzc3VlQ29tbWVudDc5MDM2OTA3Ng==,9599,2021-03-04T06:54:46Z,2021-03-04T06:54:46Z,MEMBER,"The Rich-powered progress bar is pretty: ![rich](https://user-images.githubusercontent.com/9599/109923307-71f69200-7c73-11eb-9ee2-8f0a240f3994.gif) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401, https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790312268,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790312268,MDEyOklzc3VlQ29tbWVudDc5MDMxMjI2OA==,9599,2021-03-04T05:48:16Z,2021-03-04T05:48:16Z,MEMBER,"Wow, my mbox is a 10.35 GB download!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401, https://github.com/simonw/datasette/pull/1243#issuecomment-790311215,https://api.github.com/repos/simonw/datasette/issues/1243,790311215,MDEyOklzc3VlQ29tbWVudDc5MDMxMTIxNQ==,9599,2021-03-04T05:45:57Z,2021-03-04T05:45:57Z,OWNER,Thanks!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",815955014, https://github.com/simonw/datasette/issues/268#issuecomment-790257263,https://api.github.com/repos/simonw/datasette/issues/268,790257263,MDEyOklzc3VlQ29tbWVudDc5MDI1NzI2Mw==,649467,2021-03-04T03:20:23Z,2021-03-04T03:20:23Z,NONE,"It's kind of an ugly hack, but you can try out what using the fts5 table as an actual datasette-accessible table looks like without changing any datasette code by creating yet another view on top of the fts5 table: `create view proxyview as select *, rank, table_fts as fts from table_fts;` That's now visible from datasette, just like any other view, but you can use `fts match escape_fts(search_string) order by rank`. This is only good as a proof of concept because you're inefficiently going from view -> fts5 external content table -> view -> data table. However, it does show it works.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",323718842,