github
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-1710380941 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8 | 1710380941 | IC_kwDODFE5qs5l8leN | 28565 | 2023-09-07T15:39:59Z | 2023-09-07T15:39:59Z | NONE | > @maxhawkins curious why you didn't use the stdlib `mailbox` to parse the `mbox` files? Mailbox parses the entire mbox into memory. Using the lower level library lets us stream the emails in one at a time to support larger archives. Both libraries are in the stdlib. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
954546309 | |
https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-1003437288 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8 | 1003437288 | IC_kwDODFE5qs47zzzo | 28565 | 2021-12-31T19:06:20Z | 2021-12-31T19:06:20Z | NONE | > @maxhawkins how hard would it be to add an entry to the table that includes the HTML version of the email, if it exists? I just attempted your the PR branch on a very small mbox file, and it worked great. My use case is a research project and I need to access more than just the body plain text. Shouldn't be hard. The easiest way is probably to remove the `if body.content_type == "text/html"` clause from [utils.py:254](https://github.com/dogsheep/google-takeout-to-sqlite/pull/8/commits/8e6d487b697ce2e8ad885acf613a157bfba84c59#diff-25ad9dd1ced1b8bfc37fda8444819c803232c08891e4af3d4064aa205d8174eaR254) and just return content directly without parsing. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
954546309 | |
https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-896378525 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8 | 896378525 | IC_kwDODFE5qs41baad | 28565 | 2021-08-10T23:28:45Z | 2021-08-10T23:28:45Z | NONE | I added parsing of text/html emails using BeautifulSoup. Around half of the emails in my archive don't include a text/plain payload so adding html parsing makes a good chunk of them searchable. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
954546309 | |
https://github.com/dogsheep/google-takeout-to-sqlite/pull/8#issuecomment-894581223 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8 | 894581223 | IC_kwDODFE5qs41Ujnn | 28565 | 2021-08-07T00:57:48Z | 2021-08-07T00:57:48Z | NONE | Just added two more fixes: * Added parsing for rfc 2047 encoded unicode headers * Body is now stored as TEXT rather than a BLOB regardless of what order the messages are parsed in. I was able to run this on my Takeout export and everything seems to work fine. @simonw let me know if this looks good to merge. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
954546309 |