{"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-1710950671", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8", "id": 1710950671, "node_id": "IC_kwDODFE5qs5l-wkP", "user": {"value": 150855, "label": "iloveitaly"}, "created_at": "2023-09-08T01:22:49Z", "updated_at": "2023-09-08T01:22:49Z", "author_association": "NONE", "body": "Makes sense, thanks for explaining!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 954546309, "label": "Add Gmail takeout mbox import (v2)"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-1710380941", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8", "id": 1710380941, "node_id": "IC_kwDODFE5qs5l8leN", "user": {"value": 28565, "label": "maxhawkins"}, "created_at": "2023-09-07T15:39:59Z", "updated_at": "2023-09-07T15:39:59Z", "author_association": "NONE", "body": "> @maxhawkins curious why you didn't use the stdlib `mailbox` to parse the `mbox` files?\r\n\r\nMailbox parses the entire mbox into memory. Using the lower level library lets us stream the emails in one at a time to support larger archives. Both libraries are in the stdlib.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 954546309, "label": "Add Gmail takeout mbox import (v2)"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-1708945716", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8", "id": 1708945716, "node_id": "IC_kwDODFE5qs5l3HE0", "user": {"value": 150855, "label": "iloveitaly"}, "created_at": "2023-09-06T19:12:33Z", "updated_at": "2023-09-06T19:12:33Z", "author_association": "NONE", "body": "@maxhawkins curious why you didn't use the stdlib `mailbox` to parse the `mbox` files?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 954546309, "label": "Add Gmail takeout mbox import (v2)"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-1003437288", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8", "id": 1003437288, "node_id": "IC_kwDODFE5qs47zzzo", "user": {"value": 28565, "label": "maxhawkins"}, "created_at": "2021-12-31T19:06:20Z", "updated_at": "2021-12-31T19:06:20Z", "author_association": "NONE", "body": "> @maxhawkins how hard would it be to add an entry to the table that includes the HTML version of the email, if it exists? I just attempted your the PR branch on a very small mbox file, and it worked great. My use case is a research project and I need to access more than just the body plain text.\r\n\r\nShouldn't be hard. The easiest way is probably to remove the `if body.content_type == \"text/html\"` clause from [utils.py:254](https://github.com/dogsheep/google-takeout-to-sqlite/pull/8/commits/8e6d487b697ce2e8ad885acf613a157bfba84c59#diff-25ad9dd1ced1b8bfc37fda8444819c803232c08891e4af3d4064aa205d8174eaR254) and just return content directly without parsing.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 954546309, "label": "Add Gmail takeout mbox import (v2)"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-1002735370", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8", "id": 1002735370, "node_id": "IC_kwDODFE5qs47xIcK", "user": {"value": 203343, "label": "Btibert3"}, "created_at": "2021-12-29T18:58:23Z", "updated_at": "2021-12-29T18:58:23Z", "author_association": "NONE", "body": "@maxhawkins how hard would it be to add an entry to the table that includes the HTML version of the email, if it exists? I just attempted your the PR branch on a very small mbox file, and it worked great. My use case is a research project and I need to access more than just the body plain text.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 954546309, "label": "Add Gmail takeout mbox import (v2)"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-896378525", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8", "id": 896378525, "node_id": "IC_kwDODFE5qs41baad", "user": {"value": 28565, "label": "maxhawkins"}, "created_at": "2021-08-10T23:28:45Z", "updated_at": "2021-08-10T23:28:45Z", "author_association": "NONE", "body": "I added parsing of text/html emails using BeautifulSoup.\r\n\r\nAround half of the emails in my archive don't include a text/plain payload so adding html parsing makes a good chunk of them searchable.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 954546309, "label": "Add Gmail takeout mbox import (v2)"}, "performed_via_github_app": null} {"html_url": "https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-894581223", "issue_url": "https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8", "id": 894581223, "node_id": "IC_kwDODFE5qs41Ujnn", "user": {"value": 28565, "label": "maxhawkins"}, "created_at": "2021-08-07T00:57:48Z", "updated_at": "2021-08-07T00:57:48Z", "author_association": "NONE", "body": "Just added two more fixes:\r\n\r\n* Added parsing for rfc 2047 encoded unicode headers\r\n* Body is now stored as TEXT rather than a BLOB regardless of what order the messages are parsed in.\r\n\r\nI was able to run this on my Takeout export and everything seems to work fine. @simonw let me know if this looks good to merge.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 954546309, "label": "Add Gmail takeout mbox import (v2)"}, "performed_via_github_app": null}