7 rows where user = 306240 sorted by updated_at descending

View and edit SQL

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

user

  • UtahDave · 7

author_association

id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
791530093 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-791530093 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MTUzMDA5Mw== UtahDave 306240 2021-03-05T16:28:07Z 2021-03-05T16:28:07Z NONE

I just tried to run this on a small VPS instance with 2GB of memory and it crashed out of memory while processing a 12GB mbox from Takeout.

Is it possible to stream the emails to sqlite instead of loading it all into memory and upserting at once?

@maxhawkins a limitation of the python mbox module is it loads the entire mbox into memory. I did find another approach to this problem that didn't use the builtin python mbox module and created a generator so that it didn't have to load the whole mbox into memory. I was hoping to use standard library modules, but this might be a good reason to investigate that approach a bit more. My worry is making sure a custom processor handles all the ins and outs of the mbox format correctly.

Hm. As I'm writing this, I thought of something. I think I can parse each message one at a time, and then use an mbox function to load each message using the python mbox module. That way the mbox module can still deal with the specifics of the mbox format, but I can use a generator.

I'll give that a try. Thanks for the feedback @maxhawkins and @simonw. I'll give that a try.

@simonw can we hold off on merging this until I can test this new approach?

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790391711 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790391711 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDM5MTcxMQ== UtahDave 306240 2021-03-04T07:36:24Z 2021-03-04T07:36:24Z NONE

Looks like you're doing this:

python elif message.get_content_type() == "text/plain": body = message.get_payload(decode=True)

So presumably that decodes to a unicode string?

I imagine the reason the column is a BLOB for me is that sqlite-utils determines the column type based on the first batch of items - https://github.com/simonw/sqlite-utils/blob/09c3386f55f766b135b6a1c00295646c4ae29bec/sqlite_utils/db.py#L1927-L1928 - and I got unlucky and had something in my first batch that wasn't a unicode string.

Ah, that's good to know. I think explicitly creating the tables will be a great improvement. I'll add that.

Also, I noticed after I opened this PR that the message.get_payload() is being deprecated in favor of message.get_content() or something like that. I'll see if that handles the decoding better, too.

Thanks for the feedback. I should have time tomorrow to put together some improvements.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790389335 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790389335 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDM4OTMzNQ== UtahDave 306240 2021-03-04T07:32:04Z 2021-03-04T07:32:04Z NONE

The command takes quite a while to start running, presumably because this line causes it to have to scan the WHOLE file in order to generate a count:

https://github.com/dogsheep/google-takeout-to-sqlite/blob/a3de045eba0fae4b309da21aa3119102b0efc576/google_takeout_to_sqlite/utils.py#L66-L67

I'm fine with waiting though. It's not like this is a command people run every day - and without that count we can't show a progress bar, which seems pretty important for a process that takes this long.

The wait is from python loading the mbox file. This happens regardless if you're getting the length of the mbox. The mbox module is on the slow side. It is possible to do one's own parsing of the mbox, but I kind of wanted to avoid doing that.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
784638394 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-784638394 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc4NDYzODM5NA== UtahDave 306240 2021-02-24T00:36:18Z 2021-02-24T00:36:18Z NONE

I noticed that @simonw is using black for formatting. I ran black on my additions in this PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
783794520 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-783794520 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc4Mzc5NDUyMA== UtahDave 306240 2021-02-23T01:13:54Z 2021-02-23T01:13:54Z NONE

Also, @simonw I created a test based off the existing tests. I think it's working correctly

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
783688547 https://github.com/dogsheep/google-takeout-to-sqlite/issues/4#issuecomment-783688547 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/4 MDEyOklzc3VlQ29tbWVudDc4MzY4ODU0Nw== UtahDave 306240 2021-02-22T21:31:28Z 2021-02-22T21:31:28Z NONE

@Btibert3 I've opened a PR with my initial attempt at this. Would you be willing to give this a try?

https://github.com/dogsheep/google-takeout-to-sqlite/pull/5

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Feature Request: Gmail 778380836  
780817596 https://github.com/dogsheep/google-takeout-to-sqlite/issues/4#issuecomment-780817596 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/4 MDEyOklzc3VlQ29tbWVudDc4MDgxNzU5Ng== UtahDave 306240 2021-02-17T20:01:35Z 2021-02-17T20:01:35Z NONE

I've got this almost working. Just needs some polish

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Feature Request: Gmail 778380836  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);