home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 884672647

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions issue performed_via_github_app
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-884672647 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 884672647 IC_kwDODFE5qs40uwiH 28565 2021-07-22T05:56:31Z 2021-07-22T14:03:08Z NONE

How does this commit look? https://github.com/maxhawkins/google-takeout-to-sqlite/commit/72802a83fee282eb5d02d388567731ba4301050d

It seems that Takeout's mbox format is pretty simple, so we can get away with just splitting the file on lines begining with From. My commit just splits the file every time a line starts with From and uses email.message_from_bytes to parse each chunk.

I was able to load a 12GB takeout mbox without the program using more than a couple hundred MB of memory during the import process. It does make us lose the progress bar, but maybe I can add that back in a later commit.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
813880401  
Powered by Datasette · Queries took 1.145ms · About: github-to-sqlite