{"id": 954546309, "node_id": "MDExOlB1bGxSZXF1ZXN0Njk4NDIzNjY3", "number": 8, "title": "Add Gmail takeout mbox import (v2)", "user": {"value": 28565, "label": "maxhawkins"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 7, "created_at": "2021-07-28T07:05:32Z", "updated_at": "2023-09-08T01:22:49Z", "closed_at": null, "author_association": "FIRST_TIME_CONTRIBUTOR", "pull_request": "dogsheep/google-takeout-to-sqlite/pulls/8", "body": "WIP\r\n\r\nThis PR builds on #5 to continue implementing gmail import support.\r\n\r\nBuilding on @UtahDave's work, these commits add a few performance and bug fixes:\r\n\r\n* Decreased memory overhead for import by manually parsing mbox headers.\r\n* Fixed error where some messages in the mbox would yield a row with NULL in all columns.\r\n\r\nI will send more commits to fix any errors I encounter as I run the importer on my personal takeout data.", "repo": {"value": 206649770, "label": "google-takeout-to-sqlite"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/8/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 1884499674, "node_id": "PR_kwDODFE5qs5ZtYMc", "number": 13, "title": "use poetry for packages, asdf for versioning, and gh actions for ci", "user": {"value": 150855, "label": "iloveitaly"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2023-09-06T17:59:16Z", "updated_at": "2023-09-06T17:59:16Z", "closed_at": null, "author_association": "FIRST_TIME_CONTRIBUTOR", "pull_request": "dogsheep/google-takeout-to-sqlite/pulls/13", "body": "- build: use poetry for package management, asdf for python version\n- build: cleanup poetry config, add keywords, ignore dist\n- ci: migrate circleci to gh actions\n- fix: dup method definition\n", "repo": {"value": 206649770, "label": "google-takeout-to-sqlite"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/13/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 1557599877, "node_id": "I_kwDODFE5qs5c1xaF", "number": 12, "title": "location history changes", "user": {"value": 14809320, "label": "gerardrbentley"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2023-01-26T03:57:25Z", "updated_at": "2023-01-26T03:57:25Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "not sure if each download is unique, but I had to change some things to work with the takeout zip I made 2023-01-25\r\n\r\nfilename changed from \"Location History.json\" to \"Records.json\"\r\n\r\n`\"timestampMs\"` is not present, `\"timestamp\"` is roughly iso timestamp\r\n\r\n```py\r\ndef get_timestamp_ms(raw_timestamp):\r\n try:\r\n return datetime.datetime.strptime(raw_timestamp, \"%Y-%m-%dT%H:%M:%SZ\").timestamp()\r\n except ValueError:\r\n return datetime.datetime.strptime(raw_timestamp, \"%Y-%m-%dT%H:%M:%S.%fZ\").timestamp()\r\n\r\ndef save_location_history(db, zf):\r\n location_history = json.load(\r\n zf.open(\"Takeout/Location History/Records.json\")\r\n )\r\n db[\"location_history\"].upsert_all(\r\n (\r\n {\r\n \"id\": id_for_location_history(row),\r\n \"latitude\": row[\"latitudeE7\"] / 1e7,\r\n \"longitude\": row[\"longitudeE7\"] / 1e7,\r\n \"accuracy\": row[\"accuracy\"],\r\n \"timestampMs\": get_timestamp_ms(row[\"timestamp\"]),\r\n \"when\": row[\"timestamp\"],\r\n }\r\n for row in location_history[\"locations\"]\r\n ),\r\n pk=\"id\",\r\n )\r\n\r\n\r\ndef id_for_location_history(row):\r\n # We want an ID that is unique but can be sorted by in\r\n # date order - so we use the isoformat date + the first\r\n # 6 characters of a hash of the JSON\r\n first_six = hashlib.sha1(\r\n json.dumps(row, separators=(\",\", \":\"), sort_keys=True).encode(\"utf8\")\r\n ).hexdigest()[:6]\r\n return \"{}-{}\".format(\r\n row['timestamp'],\r\n first_six,\r\n )\r\n```\r\n\r\nexample locations from mine\r\n\r\n```json\r\n{\r\n \"latitudeE7\": 427220206,\r\n \"longitudeE7\": -923423972,\r\n \"accuracy\": 10,\r\n \"deviceTag\": -1312429967,\r\n \"deviceDesignation\": \"PRIMARY\",\r\n \"timestamp\": \"2019-01-08T23:31:50.867Z\"\r\n }\r\n```\r\n\r\n```json\r\n{\r\n \"latitudeE7\": 427011317,\r\n \"longitudeE7\": -923448300,\r\n \"accuracy\": 5,\r\n \"deviceTag\": -1312429967,\r\n \"deviceDesignation\": \"PRIMARY\",\r\n \"timestamp\": \"2019-01-08T23:33:53Z\"\r\n }, \r\n```", "repo": {"value": 206649770, "label": "google-takeout-to-sqlite"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/12/reactions\", \"total_count\": 2, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 2}", "draft": null, "state_reason": null} {"id": 1250287607, "node_id": "PR_kwDODFE5qs44jvRV", "number": 11, "title": "Update README.md", "user": {"value": 11887, "label": "ashanan"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2022-05-27T03:13:59Z", "updated_at": "2022-05-27T03:13:59Z", "closed_at": null, "author_association": "FIRST_TIME_CONTRIBUTOR", "pull_request": "dogsheep/google-takeout-to-sqlite/pulls/11", "body": "Fix typo", "repo": {"value": 206649770, "label": "google-takeout-to-sqlite"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/11/reactions\", \"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 1123393829, "node_id": "I_kwDODFE5qs5C9aEl", "number": 10, "title": "sqlite3.OperationalError: no such table: main.my_activity", "user": {"value": 69208826, "label": "glxblt14"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2022-02-03T17:59:29Z", "updated_at": "2022-03-20T02:38:07Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Hello,\r\nWhen i run the command `google-takeout-to-sqlite my-activity db.db takeout-20220203T174446Z-001.zip`, i get this error :\r\n```\r\nTraceback (most recent call last):\r\n File \"c:\\users\\julie\\appdata\\local\\programs\\python\\python39-32\\lib\\runpy.py\", line 197, in _run_module_as_main\r\n return _run_code(code, main_globals, None,\r\n File \"c:\\users\\julie\\appdata\\local\\programs\\python\\python39-32\\lib\\runpy.py\", line 87, in _run_code\r\n exec(code, run_globals)\r\n File \"C:\\Users\\julie\\AppData\\Local\\Programs\\Python\\Python39-32\\Scripts\\google-takeout-to-sqlite.exe\\__main__.py\", line 7, in \r\n File \"c:\\users\\julie\\appdata\\local\\programs\\python\\python39-32\\lib\\site-packages\\click\\core.py\", line 1128, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"c:\\users\\julie\\appdata\\local\\programs\\python\\python39-32\\lib\\site-packages\\click\\core.py\", line 1053, in main\r\n rv = self.invoke(ctx)\r\n File \"c:\\users\\julie\\appdata\\local\\programs\\python\\python39-32\\lib\\site-packages\\click\\core.py\", line 1659, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"c:\\users\\julie\\appdata\\local\\programs\\python\\python39-32\\lib\\site-packages\\click\\core.py\", line 1395, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"c:\\users\\julie\\appdata\\local\\programs\\python\\python39-32\\lib\\site-packages\\click\\core.py\", line 754, in invoke\r\n return __callback(*args, **kwargs)\r\n File \"c:\\users\\julie\\appdata\\local\\programs\\python\\python39-32\\lib\\site-packages\\google_takeout_to_sqlite\\cli.py\", line 31, in my_activity\r\n utils.save_my_activity(db, zf)\r\n File \"c:\\users\\julie\\appdata\\local\\programs\\python\\python39-32\\lib\\site-packages\\google_takeout_to_sqlite\\utils.py\", line 19, in save_my_activity\r\n db[\"my_activity\"].create_index([\"time\"])\r\n File \"c:\\users\\julie\\appdata\\local\\programs\\python\\python39-32\\lib\\site-packages\\sqlite_utils\\db.py\", line 629, in create_index\r\n self.db.conn.execute(sql)\r\nsqlite3.OperationalError: no such table: main.my_activity\r\n```\r\nThank you for your help\r\nSorry for my bad English\r\nEDIT: i used the json format", "repo": {"value": 206649770, "label": "google-takeout-to-sqlite"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/10/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1046887492, "node_id": "PR_kwDODFE5qs4uMsMJ", "number": 9, "title": "Removed space from filename My Activity.json", "user": {"value": 91880982, "label": "widadmogral"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2021-11-08T00:04:31Z", "updated_at": "2021-11-08T00:04:31Z", "closed_at": null, "author_association": "FIRST_TIME_CONTRIBUTOR", "pull_request": "dogsheep/google-takeout-to-sqlite/pulls/9", "body": "File name from google takeout has no space. The code only runs without error if filename is \"MyActivity.json\" and not \"My Activity.json\". Is it a new change by Google?", "repo": {"value": 206649770, "label": "google-takeout-to-sqlite"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/9/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 813880401, "node_id": "MDExOlB1bGxSZXF1ZXN0NTc3OTUzNzI3", "number": 5, "title": "WIP: Add Gmail takeout mbox import", "user": {"value": 306240, "label": "UtahDave"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 25, "created_at": "2021-02-22T21:30:40Z", "updated_at": "2021-07-28T07:18:56Z", "closed_at": null, "author_association": "FIRST_TIME_CONTRIBUTOR", "pull_request": "dogsheep/google-takeout-to-sqlite/pulls/5", "body": "WIP\r\n\r\nThis PR adds the ability to import emails from a Gmail mbox export from Google Takeout.\r\n\r\nThis is my first PR to a datasette/dogsheep repo. I've tested this on my personal Google Takeout mbox with ~520,000 emails going back to 2004. This took around ~20 minutes to process.\r\n\r\nTo provide some feedback on the progress of the import I added the \"rich\" python module. I'm happy to remove that if adding a dependency is discouraged. However, I think it makes a nice addition to give feedback on the progress of a long import.\r\n\r\nDo we want to log emails that have errors when trying to import them?\r\n\r\nDealing with encodings with emails is a bit tricky. I'm very open to feedback on how to deal with those better. As well as any other feedback for improvements.", "repo": {"value": 206649770, "label": "google-takeout-to-sqlite"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 930946817, "node_id": "MDU6SXNzdWU5MzA5NDY4MTc=", "number": 7, "title": "KeyError: 'accuracy' when processing Location History", "user": {"value": 403152, "label": "davidwilemski"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2021-06-27T14:39:43Z", "updated_at": "2021-06-27T14:39:43Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "I'm new to both the dogsheep tools and datasette but have been experimenting a bit the last few days and these are really cool tools!\r\n\r\nI encountered a problem running my Google location history through this tool running the latest release in a docker container:\r\n\r\n```\r\nTraceback (most recent call last):\r\n File \"/usr/local/bin/google-takeout-to-sqlite\", line 8, in \r\n sys.exit(cli())\r\n File \"/usr/local/lib/python3.9/site-packages/click/core.py\", line 829, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"/usr/local/lib/python3.9/site-packages/click/core.py\", line 782, in main\r\n rv = self.invoke(ctx)\r\n File \"/usr/local/lib/python3.9/site-packages/click/core.py\", line 1259, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"/usr/local/lib/python3.9/site-packages/click/core.py\", line 1066, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"/usr/local/lib/python3.9/site-packages/click/core.py\", line 610, in invoke\r\n return callback(*args, **kwargs)\r\n File \"/usr/local/lib/python3.9/site-packages/google_takeout_to_sqlite/cli.py\", line 49, in my_activity\r\n utils.save_location_history(db, zf)\r\n File \"/usr/local/lib/python3.9/site-packages/google_takeout_to_sqlite/utils.py\", line 27, in save_location_history\r\n db[\"location_history\"].upsert_all(\r\n File \"/usr/local/lib/python3.9/site-packages/sqlite_utils/db.py\", line 1105, in upsert_all\r\n return self.insert_all(\r\n File \"/usr/local/lib/python3.9/site-packages/sqlite_utils/db.py\", line 990, in insert_all\r\n chunk = list(chunk)\r\n File \"/usr/local/lib/python3.9/site-packages/google_takeout_to_sqlite/utils.py\", line 33, in \r\n \"accuracy\": row[\"accuracy\"],\r\nKeyError: 'accuracy'\r\n```\r\n\r\nIt looks like the tool assumes the `accuracy` key will be in every location history entry. \r\n\r\nMy first attempt at a local patch to get myself going was to convert accessing the `accuracy` key to a `.get` instead to hopefully make the row nullable but I wasn't quite sure what `sqlite_utils` would do there. That did work in that the import happened and so I was going to propose a patch that made that change but in updating the existing test to include an entry with a missing accuracy entry, I noticed the expected type of the field appeared to be changing to a string in the test (and from a quick scan through the sqlite_utils code, probably TEXT in the database). Given this change in column type, it seemed that opening an issue first before proposing a fix seemed warranted. It seems the schema would need to be explicitly specified if you wanted a nullable integer column.\r\n\r\nNow that I've done a successful import run using my initial fix of calling `.get` on the row dict, I can see with datasette that I only have 7 data points (out of ~250k) that have a null accuracy column. They are all from 2011-2012 in an import that includes points spanning ~2010-2016 so perhaps another approach might be to filter those entries out during import if it really is that infrequent?\r\n\r\nI'm happy to provide a PR for a fix but figured I'd ask about which direction is preferred first.", "repo": {"value": 206649770, "label": "google-takeout-to-sqlite"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/7/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 778380836, "node_id": "MDU6SXNzdWU3NzgzODA4MzY=", "number": 4, "title": "Feature Request: Gmail", "user": {"value": 203343, "label": "Btibert3"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 5, "created_at": "2021-01-04T21:31:09Z", "updated_at": "2021-03-04T20:54:44Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "From takeout, I only exported my Gmail account. Ideally I could parse this into sqlite via this tool.", "repo": {"value": 206649770, "label": "google-takeout-to-sqlite"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/4/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 821841046, "node_id": "MDU6SXNzdWU4MjE4NDEwNDY=", "number": 6, "title": "Upgrade to latest sqlite-utils", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2021-03-04T07:21:54Z", "updated_at": "2021-03-04T07:22:51Z", "closed_at": null, "author_association": "MEMBER", "pull_request": null, "body": "This is pinned to v1 at the moment.", "repo": {"value": 206649770, "label": "google-takeout-to-sqlite"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/6/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 769397742, "node_id": "MDU6SXNzdWU3NjkzOTc3NDI=", "number": 3, "title": "sqlite-utils error on takeout import", "user": {"value": 231498, "label": "khimaros"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2020-12-17T01:18:48Z", "updated_at": "2020-12-17T01:19:04Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "```\r\n$ google-takeout-to-sqlite my-activity takeout.db /path/to/zip\r\n...\r\nsqlite3.OperationalError: no such table: main.my_activity\r\n```\r\n\r\nthere is no table create in `utils.py`, unlike other importers such as github-to-sqlite\r\n\r\nadditionally, this package and hackernews-to-sqlite have conflicting `sqlite-utils` dep with datasette and dogsheep-beta", "repo": {"value": 206649770, "label": "google-takeout-to-sqlite"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/3/reactions\", \"total_count\": 2, \"+1\": 2, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 769376447, "node_id": "MDU6SXNzdWU3NjkzNzY0NDc=", "number": 2, "title": "killed by oomkiller on large location-history", "user": {"value": 231498, "label": "khimaros"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2020-12-17T00:32:24Z", "updated_at": "2020-12-17T00:48:32Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "memory seems to grow unbounded and is oom-killed after about 20GB memory usage.\r\n\r\nthis is happening while loading a ~1GB uncompressed location history.", "repo": {"value": 206649770, "label": "google-takeout-to-sqlite"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/2/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 504720731, "node_id": "MDU6SXNzdWU1MDQ3MjA3MzE=", "number": 1, "title": "Add more details on how to request data from google takeout correctly.", "user": {"value": 1055831, "label": "dazzag24"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2019-10-09T15:17:34Z", "updated_at": "2019-10-09T15:17:34Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "The default is to download everything. This can result in an enormous amount of data when you only really need 2 types of data for now:\r\n\r\n- My Activity\r\n- Location History\r\n\r\nIn addition unless you specify that \"My Activity\" is downloaded in JSON format the default is HTML. This then causes the \r\n\r\n`google-takeout-to-sqlite my-activity takeout.db takeout.zip`\r\n\r\ncommand to fail as it only contains html files not json files.\r\n\r\nThanks", "repo": {"value": 206649770, "label": "google-takeout-to-sqlite"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/1/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null}