home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

14 rows where "updated_at" is on date 2021-03-04 and user = 9599 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 3

  • WIP: Add Gmail takeout mbox import 12
  • fix small typo 1
  • Upgrade to latest sqlite-utils 1

author_association 2

  • MEMBER 13
  • OWNER 1

user 1

  • simonw · 14 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
790695126 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790695126 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDY5NTEyNg== simonw 9599 2021-03-04T15:20:42Z 2021-03-04T15:20:42Z MEMBER

I'm not sure why but my most recent import, when displayed in Datasette, looks like this:

Sorting by id in the opposite order gives me the data I would expect - so it looks like a bunch of null/blank messages are being imported at some point and showing up first due to ID ordering.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790693674 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790693674 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDY5MzY3NA== simonw 9599 2021-03-04T15:18:36Z 2021-03-04T15:18:36Z MEMBER

I imported my 10GB mbox with 750,000 emails in it, ran this tool (with a hacked fix for the blob column problem) - and now a search that returns 92 results takes 25.37ms! This is fantastic.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790669767 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790669767 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDY2OTc2Nw== simonw 9599 2021-03-04T14:46:06Z 2021-03-04T14:46:06Z MEMBER

Solution could be to pre-process that string by splitting on ( and dropping everything afterwards, assuming that the (...) bit isn't necessary for correctly parsing the date.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790668263 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790668263 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDY2ODI2Mw== simonw 9599 2021-03-04T14:43:58Z 2021-03-04T14:43:58Z MEMBER

I added this code to output a message ID on errors: diff print("Errors: {}".format(num_errors)) print(traceback.format_exc()) + print("Message-Id: {}".format(email.get("Message-Id", "None"))) continue Having found a message ID that had an error, I ran this command to see the context:

rg --text --context 20 '44F289B0.000001.02100@SCHWARZE-DWFXMI' ~/gmail.mbox

This was for the following error: File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 102, in get_mbox message["date"] = get_message_date(email.get("Date"), email.get_from()) File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 178, in get_message_date datetime_tuple = email.utils.parsedate_tz(mail_date) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 50, in parsedate_tz res = _parsedate_tz(data) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 69, in _parsedate_tz data = data.split() AttributeError: 'Header' object has no attribute 'split' Here's what I spotted in the ripgrep output: 177133570:Message-Id: <44F289B0.000001.02100@SCHWARZE-DWFXMI> 177133571-Date: Mon, 28 Aug 2006 08:14:08 +0200 (Westeurop�ische Sommerzeit) 177133572-X-Mailer: IncrediMail (5002253) So it could it be that _parsedate_tz is having trouble with that Mon, 28 Aug 2006 08:14:08 +0200 (Westeurop�ische Sommerzeit) string.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790384087 https://github.com/dogsheep/google-takeout-to-sqlite/issues/6#issuecomment-790384087 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/6 MDEyOklzc3VlQ29tbWVudDc5MDM4NDA4Nw== simonw 9599 2021-03-04T07:22:51Z 2021-03-04T07:22:51Z MEMBER

3 also mentions the conflicting version with other tools.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Upgrade to latest sqlite-utils 821841046  
790380839 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790380839 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDM4MDgzOQ== simonw 9599 2021-03-04T07:17:05Z 2021-03-04T07:17:05Z MEMBER

Looks like you're doing this: python elif message.get_content_type() == "text/plain": body = message.get_payload(decode=True) So presumably that decodes to a unicode string?

I imagine the reason the column is a BLOB for me is that sqlite-utils determines the column type based on the first batch of items - https://github.com/simonw/sqlite-utils/blob/09c3386f55f766b135b6a1c00295646c4ae29bec/sqlite_utils/db.py#L1927-L1928 - and I got unlucky and had something in my first batch that wasn't a unicode string.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790379629 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790379629 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDM3OTYyOQ== simonw 9599 2021-03-04T07:14:41Z 2021-03-04T07:14:41Z MEMBER

Confirmed: removing the len() call does not speed things up, so it's reading through the entire file for some other purpose too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790378658 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790378658 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDM3ODY1OA== simonw 9599 2021-03-04T07:12:48Z 2021-03-04T07:12:48Z MEMBER

It looks like the body is being loaded into a BLOB column - so in Datasette default it looks like this:

If I datasette install datasette-render-binary and then try again I get this:

It would be great if we could store the body as unicode text instead. May have to do something clever to decode it based on some kind of charset header?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790373024 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790373024 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDM3MzAyNA== simonw 9599 2021-03-04T07:01:58Z 2021-03-04T07:04:06Z MEMBER

I got 9 warnings that look like this: Errors: 1 Traceback (most recent call last): File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 103, in get_mbox message["date"] = get_message_date(email.get("Date"), email.get_from()) File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 167, in get_message_date datetime_tuple = email.utils.parsedate_tz(mail_date) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 50, in parsedate_tz res = _parsedate_tz(data) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 69, in _parsedate_tz data = data.split() AttributeError: 'Header' object has no attribute 'split' It would be useful if those warnings told me the message ID (or similar) of the affected message so I could grep for it in the mbox and see what was going on.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790372621 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790372621 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDM3MjYyMQ== simonw 9599 2021-03-04T07:01:18Z 2021-03-04T07:01:18Z MEMBER

I'm not sure if it would work, but there is an alternative pattern for showing a progress bar against a really large file that I've used in healthkit-to-sqlite - you set the progress bar size to the size of the file in bytes, then update a counter as you read the file.

https://github.com/dogsheep/healthkit-to-sqlite/blob/3eb2b06bfe3b4faaf10e9cf9dfcb28e3d16c14ff/healthkit_to_sqlite/cli.py#L24-L57 and https://github.com/dogsheep/healthkit-to-sqlite/blob/3eb2b06bfe3b4faaf10e9cf9dfcb28e3d16c14ff/healthkit_to_sqlite/utils.py#L4-L19 (the progress_callback() bit) is where that happens.

It can be a bit of a convoluted pattern, and I'm not at all sure it would work for mbox files since it looks like that library has other reasons it needs to do a file scan rather than streaming it through one chunk of bytes at a time. So I imagine this would not work here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790370485 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790370485 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDM3MDQ4NQ== simonw 9599 2021-03-04T06:57:25Z 2021-03-04T06:57:48Z MEMBER

The command takes quite a while to start running, presumably because this line causes it to have to scan the WHOLE file in order to generate a count:

https://github.com/dogsheep/google-takeout-to-sqlite/blob/a3de045eba0fae4b309da21aa3119102b0efc576/google_takeout_to_sqlite/utils.py#L66-L67

I'm fine with waiting though. It's not like this is a command people run every day - and without that count we can't show a progress bar, which seems pretty important for a process that takes this long.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790369076 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790369076 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDM2OTA3Ng== simonw 9599 2021-03-04T06:54:46Z 2021-03-04T06:54:46Z MEMBER

The Rich-powered progress bar is pretty:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790312268 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790312268 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDMxMjI2OA== simonw 9599 2021-03-04T05:48:16Z 2021-03-04T05:48:16Z MEMBER

Wow, my mbox is a 10.35 GB download!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790311215 https://github.com/simonw/datasette/pull/1243#issuecomment-790311215 https://api.github.com/repos/simonw/datasette/issues/1243 MDEyOklzc3VlQ29tbWVudDc5MDMxMTIxNQ== simonw 9599 2021-03-04T05:45:57Z 2021-03-04T05:45:57Z OWNER

Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
fix small typo 815955014  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 867.358ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows