html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790391711,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790391711,MDEyOklzc3VlQ29tbWVudDc5MDM5MTcxMQ==,306240,2021-03-04T07:36:24Z,2021-03-04T07:36:24Z,NONE,"> Looks like you're doing this: > > ```python > elif message.get_content_type() == ""text/plain"": > body = message.get_payload(decode=True) > ``` > > So presumably that decodes to a unicode string? > > I imagine the reason the column is a `BLOB` for me is that `sqlite-utils` determines the column type based on the first batch of items - https://github.com/simonw/sqlite-utils/blob/09c3386f55f766b135b6a1c00295646c4ae29bec/sqlite_utils/db.py#L1927-L1928 - and I got unlucky and had something in my first batch that wasn't a unicode string. Ah, that's good to know. I think explicitly creating the tables will be a great improvement. I'll add that. Also, I noticed after I opened this PR that the `message.get_payload()` is being deprecated in favor of `message.get_content()` or something like that. I'll see if that handles the decoding better, too. Thanks for the feedback. I should have time tomorrow to put together some improvements.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401, https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790389335,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790389335,MDEyOklzc3VlQ29tbWVudDc5MDM4OTMzNQ==,306240,2021-03-04T07:32:04Z,2021-03-04T07:32:04Z,NONE,"> The command takes quite a while to start running, presumably because this line causes it to have to scan the WHOLE file in order to generate a count: > > https://github.com/dogsheep/google-takeout-to-sqlite/blob/a3de045eba0fae4b309da21aa3119102b0efc576/google_takeout_to_sqlite/utils.py#L66-L67 > > I'm fine with waiting though. It's not like this is a command people run every day - and without that count we can't show a progress bar, which seems pretty important for a process that takes this long. The wait is from python loading the mbox file. This happens regardless if you're getting the length of the mbox. The mbox module is on the slow side. It is possible to do one's own parsing of the mbox, but I kind of wanted to avoid doing that.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401,