461 rows where author_association = "MEMBER" sorted by updated_at descending

View and edit SQL

Suggested facets: reactions, created_at (date), updated_at (date)

issue

user

author_association

  • MEMBER · 461
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
844250232 https://github.com/dogsheep/github-to-sqlite/pull/59#issuecomment-844250232 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/59 MDEyOklzc3VlQ29tbWVudDg0NDI1MDIzMg== simonw 9599 2021-05-19T16:08:10Z 2021-05-19T16:08:10Z MEMBER

Thanks for catching this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Remove unneeded exists=True for -a/--auth flag. 771872303  
844249385 https://github.com/dogsheep/github-to-sqlite/pull/61#issuecomment-844249385 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/61 MDEyOklzc3VlQ29tbWVudDg0NDI0OTM4NQ== simonw 9599 2021-05-19T16:07:06Z 2021-05-19T16:07:06Z MEMBER

Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
fixing typo in get cli help text 797108702  
790695126 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790695126 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDY5NTEyNg== simonw 9599 2021-03-04T15:20:42Z 2021-03-04T15:20:42Z MEMBER

I'm not sure why but my most recent import, when displayed in Datasette, looks like this:

https://user-images.githubusercontent.com/9599/109985836-0ab00080-7cba-11eb-97d5-0631a0835b61.png">

Sorting by id in the opposite order gives me the data I would expect - so it looks like a bunch of null/blank messages are being imported at some point and showing up first due to ID ordering.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790693674 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790693674 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDY5MzY3NA== simonw 9599 2021-03-04T15:18:36Z 2021-03-04T15:18:36Z MEMBER

I imported my 10GB mbox with 750,000 emails in it, ran this tool (with a hacked fix for the blob column problem) - and now a search that returns 92 results takes 25.37ms! This is fantastic.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790669767 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790669767 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDY2OTc2Nw== simonw 9599 2021-03-04T14:46:06Z 2021-03-04T14:46:06Z MEMBER

Solution could be to pre-process that string by splitting on ( and dropping everything afterwards, assuming that the (...) bit isn't necessary for correctly parsing the date.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790668263 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790668263 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDY2ODI2Mw== simonw 9599 2021-03-04T14:43:58Z 2021-03-04T14:43:58Z MEMBER

I added this code to output a message ID on errors:

             print("Errors: {}".format(num_errors))
             print(traceback.format_exc())
+            print("Message-Id: {}".format(email.get("Message-Id", "None")))
             continue

Having found a message ID that had an error, I ran this command to see the context:

rg --text --context 20 '44F289B0.000001.02100@SCHWARZE-DWFXMI' ~/gmail.mbox

This was for the following error:

  File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 102, in get_mbox
    message["date"] = get_message_date(email.get("Date"), email.get_from())
  File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 178, in get_message_date
    datetime_tuple = email.utils.parsedate_tz(mail_date)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 50, in parsedate_tz
    res = _parsedate_tz(data)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 69, in _parsedate_tz
    data = data.split()
AttributeError: 'Header' object has no attribute 'split'

Here's what I spotted in the ripgrep output:

177133570:Message-Id: <44F289B0.000001.02100@SCHWARZE-DWFXMI>
177133571-Date: Mon, 28 Aug 2006 08:14:08 +0200 (Westeurop�ische Sommerzeit)
177133572-X-Mailer: IncrediMail (5002253)

So it could it be that _parsedate_tz is having trouble with that Mon, 28 Aug 2006 08:14:08 +0200 (Westeurop�ische Sommerzeit) string.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790384087 https://github.com/dogsheep/google-takeout-to-sqlite/issues/6#issuecomment-790384087 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/6 MDEyOklzc3VlQ29tbWVudDc5MDM4NDA4Nw== simonw 9599 2021-03-04T07:22:51Z 2021-03-04T07:22:51Z MEMBER

3 also mentions the conflicting version with other tools.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Upgrade to latest sqlite-utils 821841046  
790380839 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790380839 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDM4MDgzOQ== simonw 9599 2021-03-04T07:17:05Z 2021-03-04T07:17:05Z MEMBER

Looks like you're doing this:

    elif message.get_content_type() == "text/plain":
        body = message.get_payload(decode=True)

So presumably that decodes to a unicode string?

I imagine the reason the column is a BLOB for me is that sqlite-utils determines the column type based on the first batch of items - https://github.com/simonw/sqlite-utils/blob/09c3386f55f766b135b6a1c00295646c4ae29bec/sqlite_utils/db.py#L1927-L1928 - and I got unlucky and had something in my first batch that wasn't a unicode string.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790379629 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790379629 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDM3OTYyOQ== simonw 9599 2021-03-04T07:14:41Z 2021-03-04T07:14:41Z MEMBER

Confirmed: removing the len() call does not speed things up, so it's reading through the entire file for some other purpose too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790378658 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790378658 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDM3ODY1OA== simonw 9599 2021-03-04T07:12:48Z 2021-03-04T07:12:48Z MEMBER

It looks like the body is being loaded into a BLOB column - so in Datasette default it looks like this:

https://user-images.githubusercontent.com/9599/109924808-b4b96980-7c75-11eb-8c9e-307f2ae32d5a.png">

If I datasette install datasette-render-binary and then try again I get this:

https://user-images.githubusercontent.com/9599/109924944-ea5e5280-7c75-11eb-9a32-404f3d68455f.png">

It would be great if we could store the body as unicode text instead. May have to do something clever to decode it based on some kind of charset header?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790373024 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790373024 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDM3MzAyNA== simonw 9599 2021-03-04T07:01:58Z 2021-03-04T07:04:06Z MEMBER

I got 9 warnings that look like this:

Errors: 1
Traceback (most recent call last):
  File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 103, in get_mbox
    message["date"] = get_message_date(email.get("Date"), email.get_from())
  File "/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py", line 167, in get_message_date
    datetime_tuple = email.utils.parsedate_tz(mail_date)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 50, in parsedate_tz
    res = _parsedate_tz(data)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py", line 69, in _parsedate_tz
    data = data.split()
AttributeError: 'Header' object has no attribute 'split'

It would be useful if those warnings told me the message ID (or similar) of the affected message so I could grep for it in the mbox and see what was going on.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790372621 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790372621 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDM3MjYyMQ== simonw 9599 2021-03-04T07:01:18Z 2021-03-04T07:01:18Z MEMBER

I'm not sure if it would work, but there is an alternative pattern for showing a progress bar against a really large file that I've used in healthkit-to-sqlite - you set the progress bar size to the size of the file in bytes, then update a counter as you read the file.

https://github.com/dogsheep/healthkit-to-sqlite/blob/3eb2b06bfe3b4faaf10e9cf9dfcb28e3d16c14ff/healthkit_to_sqlite/cli.py#L24-L57 and https://github.com/dogsheep/healthkit-to-sqlite/blob/3eb2b06bfe3b4faaf10e9cf9dfcb28e3d16c14ff/healthkit_to_sqlite/utils.py#L4-L19 (the progress_callback() bit) is where that happens.

It can be a bit of a convoluted pattern, and I'm not at all sure it would work for mbox files since it looks like that library has other reasons it needs to do a file scan rather than streaming it through one chunk of bytes at a time. So I imagine this would not work here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790370485 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790370485 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDM3MDQ4NQ== simonw 9599 2021-03-04T06:57:25Z 2021-03-04T06:57:48Z MEMBER

The command takes quite a while to start running, presumably because this line causes it to have to scan the WHOLE file in order to generate a count:

https://github.com/dogsheep/google-takeout-to-sqlite/blob/a3de045eba0fae4b309da21aa3119102b0efc576/google_takeout_to_sqlite/utils.py#L66-L67

I'm fine with waiting though. It's not like this is a command people run every day - and without that count we can't show a progress bar, which seems pretty important for a process that takes this long.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790369076 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790369076 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDM2OTA3Ng== simonw 9599 2021-03-04T06:54:46Z 2021-03-04T06:54:46Z MEMBER

The Rich-powered progress bar is pretty:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
790312268 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790312268 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc5MDMxMjI2OA== simonw 9599 2021-03-04T05:48:16Z 2021-03-04T05:48:16Z MEMBER

Wow, my mbox is a 10.35 GB download!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
786925280 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-786925280 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDc4NjkyNTI4MA== simonw 9599 2021-02-26T22:23:10Z 2021-02-26T22:23:10Z MEMBER

Thanks!

I requested my Gmail export from takeout - once that arrives I'll test it against this and then merge the PR.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
777839351 https://github.com/dogsheep/evernote-to-sqlite/pull/10#issuecomment-777839351 https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/10 MDEyOklzc3VlQ29tbWVudDc3NzgzOTM1MQ== simonw 9599 2021-02-11T22:37:55Z 2021-02-11T22:37:55Z MEMBER

I've merged these changes by hand now, thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
BugFix for encoding and not update info. 770712149  
777827396 https://github.com/dogsheep/evernote-to-sqlite/issues/7#issuecomment-777827396 https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/7 MDEyOklzc3VlQ29tbWVudDc3NzgyNzM5Ng== simonw 9599 2021-02-11T22:13:14Z 2021-02-11T22:13:14Z MEMBER

My best guess is that you have an older version of sqlite-utils installed here - the replace=True argument was added in version 2.0. I've bumped the dependency in setup.py.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
evernote-to-sqlite on windows 10 give this error: TypeError: insert() got an unexpected keyword argument 'replace' 743297582  
777821383 https://github.com/dogsheep/evernote-to-sqlite/issues/9#issuecomment-777821383 https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/9 MDEyOklzc3VlQ29tbWVudDc3NzgyMTM4Mw== simonw 9599 2021-02-11T22:01:28Z 2021-02-11T22:01:28Z MEMBER

Aha! I think I've figured out what's going on here.

The CData blocks containing the notes look like this:

<![CDATA[<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd"><en-note><div>This note includes two images.</div><div><br /></div>...

The DTD at http://xml.evernote.com/pub/enml2.dtd includes some entities:

<!--=========== External character mnemonic entities ===================-->

<!ENTITY % HTMLlat1 PUBLIC
   "-//W3C//ENTITIES Latin 1 for XHTML//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
%HTMLlat1;

<!ENTITY % HTMLsymbol PUBLIC
   "-//W3C//ENTITIES Symbols for XHTML//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent">
%HTMLsymbol;

<!ENTITY % HTMLspecial PUBLIC
   "-//W3C//ENTITIES Special for XHTML//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent">
%HTMLspecial;

So I need to be able to handle all of those different entities. I think I can do that using html.entities.entitydefs from the Python standard library, which looks a bit like this:

{'Aacute': 'Á',
 'aacute': 'á',
 'Aacute;': 'Á',
 'aacute;': 'á',
 'Abreve;': 'Ă',
 'abreve;': 'ă',
 'ac;': '∾',
 'acd;': '∿',
# ...
}
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
ParseError: undefined entity &scaron; 748372469  
777798330 https://github.com/dogsheep/evernote-to-sqlite/issues/11#issuecomment-777798330 https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/11 MDEyOklzc3VlQ29tbWVudDc3Nzc5ODMzMA== simonw 9599 2021-02-11T21:18:58Z 2021-02-11T21:18:58Z MEMBER

Thanks for the fix!

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
XML parse error 792851444  
770071568 https://github.com/dogsheep/github-to-sqlite/issues/60#issuecomment-770071568 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/60 MDEyOklzc3VlQ29tbWVudDc3MDA3MTU2OA== simonw 9599 2021-01-29T21:56:15Z 2021-01-29T21:56:15Z MEMBER

I really like the way you're using pipes here - really smart. It's similar to how I build the demo database in this GitHub Actions workflow:

https://github.com/dogsheep/github-to-sqlite/blob/62dfd3bc4014b108200001ef4bc746feb6f33b45/.github/workflows/deploy-demo.yml#L52-L82

twitter-to-sqlite actually has a mechanism for doing this kind of thing, documented at https://github.com/dogsheep/twitter-to-sqlite#providing-input-from-a-sql-query-with---sql-and---attach

It lets you do things like:

$ twitter-to-sqlite users-lookup my.db --sql="select follower_id from following" --ids

Maybe I should add something similar to github-to-sqlite? Feels like it could be really useful.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Use Data from SQLite in other commands 797097140  
769957751 https://github.com/dogsheep/twitter-to-sqlite/issues/56#issuecomment-769957751 https://api.github.com/repos/dogsheep/twitter-to-sqlite/issues/56 MDEyOklzc3VlQ29tbWVudDc2OTk1Nzc1MQ== simonw 9599 2021-01-29T17:59:40Z 2021-01-29T17:59:40Z MEMBER

This is interesting - how did you create that initial table? Was this using the twitter-to-sqlite import archive.db ~/Downloads/twitter-2019-06-25-b31f2.zip command, or something else?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Not all quoted statuses get fetched? 796736607  
761967094 https://github.com/dogsheep/swarm-to-sqlite/issues/11#issuecomment-761967094 https://api.github.com/repos/dogsheep/swarm-to-sqlite/issues/11 MDEyOklzc3VlQ29tbWVudDc2MTk2NzA5NA== simonw 9599 2021-01-18T04:11:13Z 2021-01-18T04:11:13Z MEMBER

I just got a similar error:

  File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/swarm_to_sqlite/utils.py", line 79, in save_checkin
    checkins_table.m2m("users", user, m2m_table="with", pk="id")
  File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/sqlite_utils/db.py", line 2048, in m2m
    id = other_table.insert(record, pk=pk, replace=True).last_pk
  File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/sqlite_utils/db.py", line 1781, in insert
    return self.insert_all(
  File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/sqlite_utils/db.py", line 1899, in insert_all
    self.insert_chunk(
  File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/sqlite_utils/db.py", line 1709, in insert_chunk
    result = self.db.execute(query, params)
  File "/home/dogsheep/datasette-venv/lib/python3.8/site-packages/sqlite_utils/db.py", line 226, in execute
    return self.conn.execute(sql, parameters)
pysqlite3.dbapi2.OperationalError: table users has no column named countryCode
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Error thrown: sqlite3.OperationalError: table users has no column named lastName 743400216  
748426877 https://github.com/dogsheep/dogsheep-beta/issues/31#issuecomment-748426877 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/31 MDEyOklzc3VlQ29tbWVudDc0ODQyNjg3Nw== simonw 9599 2020-12-19T06:16:11Z 2020-12-19T06:16:11Z MEMBER

Here's why:

if "fts5" in str(e):

But the error being raised here is:

sqlite3.OperationalError: no such column: to

I'm going to attempt the escaped on on every error.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Searching for "github-to-sqlite" throws an error 771316301  
748426663 https://github.com/dogsheep/dogsheep-beta/issues/31#issuecomment-748426663 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/31 MDEyOklzc3VlQ29tbWVudDc0ODQyNjY2Mw== simonw 9599 2020-12-19T06:14:06Z 2020-12-19T06:14:06Z MEMBER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Searching for "github-to-sqlite" throws an error 771316301  
748426581 https://github.com/dogsheep/dogsheep-beta/issues/31#issuecomment-748426581 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/31 MDEyOklzc3VlQ29tbWVudDc0ODQyNjU4MQ== simonw 9599 2020-12-19T06:13:17Z 2020-12-19T06:13:17Z MEMBER

One fix for this could be to try running the raw query, but if it throws an error run it again with the query escaped.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Searching for "github-to-sqlite" throws an error 771316301  
748426501 https://github.com/dogsheep/dogsheep-beta/issues/31#issuecomment-748426501 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/31 MDEyOklzc3VlQ29tbWVudDc0ODQyNjUwMQ== simonw 9599 2020-12-19T06:12:22Z 2020-12-19T06:12:22Z MEMBER

I deliberately added support for advanced FTS in https://github.com/dogsheep/dogsheep-beta/commit/cbb2491b85d7ff416d6d429b60109e6c2d6d50b9 for #13 but that's the cause of this bug.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Searching for "github-to-sqlite" throws an error 771316301  
747126777 https://github.com/dogsheep/google-takeout-to-sqlite/issues/2#issuecomment-747126777 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/2 MDEyOklzc3VlQ29tbWVudDc0NzEyNjc3Nw== simonw 9599 2020-12-17T00:36:52Z 2020-12-17T00:36:52Z MEMBER

The memory profiler tricks I used in https://github.com/dogsheep/healthkit-to-sqlite/issues/7 could help figure out what's going on here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
killed by oomkiller on large location-history 769376447  
747034481 https://github.com/dogsheep/dogsheep-beta/issues/29#issuecomment-747034481 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/29 MDEyOklzc3VlQ29tbWVudDc0NzAzNDQ4MQ== simonw 9599 2020-12-16T21:17:05Z 2020-12-16T21:17:05Z MEMBER

I'm just going to add q for the moment.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add search highlighting snippets 724759588  
747031608 https://github.com/dogsheep/dogsheep-beta/issues/29#issuecomment-747031608 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/29 MDEyOklzc3VlQ29tbWVudDc0NzAzMTYwOA== simonw 9599 2020-12-16T21:15:18Z 2020-12-16T21:15:18Z MEMBER

Should I pass any other details to the display_sql here as well?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add search highlighting snippets 724759588  
747030964 https://github.com/dogsheep/dogsheep-beta/issues/29#issuecomment-747030964 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/29 MDEyOklzc3VlQ29tbWVudDc0NzAzMDk2NA== simonw 9599 2020-12-16T21:14:54Z 2020-12-16T21:14:54Z MEMBER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add search highlighting snippets 724759588  
747029636 https://github.com/dogsheep/dogsheep-beta/issues/29#issuecomment-747029636 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/29 MDEyOklzc3VlQ29tbWVudDc0NzAyOTYzNg== simonw 9599 2020-12-16T21:14:03Z 2020-12-16T21:14:03Z MEMBER

I think I can do this as a cunning trick in display_sql. Consider this example query: https://til.simonwillison.net/tils?sql=select%0D%0A++path%2C%0D%0A++snippet%28til_fts%2C+-1%2C+%27b4de2a49c8%27%2C+%278c94a2ed4b%27%2C+%27...%27%2C+60%29+as+snippet%0D%0Afrom%0D%0A++til%0D%0A++join+til_fts+on+til.rowid+%3D+til_fts.rowid%0D%0Awhere%0D%0A++til_fts+match+escape_fts%28%3Aq%29%0D%0A++and+path+%3D+%27asgi_lifespan-test-httpx.md%27%0D%0A&q=pytest

select
  path,
  snippet(til_fts, -1, 'b4de2a49c8', '8c94a2ed4b', '...', 60) as snippet
from
  til
  join til_fts on til.rowid = til_fts.rowid
where
  til_fts match escape_fts(:q)
  and path = 'asgi_lifespan-test-httpx.md'

The and path = 'asgi_lifespan-test-httpx.md' bit means we only get back a specific document - but the snippet highlighting is applied to it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add search highlighting snippets 724759588  
746735889 https://github.com/dogsheep/github-to-sqlite/issues/58#issuecomment-746735889 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/58 MDEyOklzc3VlQ29tbWVudDc0NjczNTg4OQ== simonw 9599 2020-12-16T17:59:50Z 2020-12-16T17:59:50Z MEMBER

I don't want to add a full HTML parser (like BeautifulSoup) as a dependency for this feature. Since the HTML comes from a single, trusted source (GitHub) I could probably handle this using regular expressions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Readme HTML has broken internal links 769150394  
746734412 https://github.com/dogsheep/github-to-sqlite/issues/58#issuecomment-746734412 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/58 MDEyOklzc3VlQ29tbWVudDc0NjczNDQxMg== simonw 9599 2020-12-16T17:58:56Z 2020-12-16T17:58:56Z MEMBER

I'm going to rewrite those <a href="#filtering-tables"> links to <a href="#user-content-filtering-tables"> - but only if a corresponding id="user-content-filtering-tables" element exists.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Readme HTML has broken internal links 769150394  
739058820 https://github.com/dogsheep/dogsheep-photos/pull/29#issuecomment-739058820 https://api.github.com/repos/dogsheep/dogsheep-photos/issues/29 MDEyOklzc3VlQ29tbWVudDczOTA1ODgyMA== simonw 9599 2020-12-04T22:32:35Z 2020-12-04T22:32:35Z MEMBER

Thanks for this!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Fixed bug in SQL query for photo scores 638375985  
735485677 https://github.com/dogsheep/github-to-sqlite/issues/53#issuecomment-735485677 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/53 MDEyOklzc3VlQ29tbWVudDczNTQ4NTY3Nw== simonw 9599 2020-11-30T00:36:09Z 2020-11-30T00:36:09Z MEMBER

Given rate limits (see #51) this command might be better implemented by running a git clone into a temporary directory - doing so would retrieve all of the files in one go.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Command for fetching file contents 753000405  
735484186 https://github.com/dogsheep/github-to-sqlite/issues/51#issuecomment-735484186 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/51 MDEyOklzc3VlQ29tbWVudDczNTQ4NDE4Ng== simonw 9599 2020-11-30T00:29:31Z 2020-11-30T00:29:31Z MEMBER

This just caused a failure in deploying the demo: https://github.com/dogsheep/github-to-sqlite/runs/1471304407?check_suite_focus=true

  File "/opt/hostedtoolcache/Python/3.8.6/x64/bin/github-to-sqlite", line 33, in <module>
    sys.exit(load_entry_point('github-to-sqlite', 'console_scripts', 'github-to-sqlite')())
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/runner/work/github-to-sqlite/github-to-sqlite/github_to_sqlite/cli.py", line 142, in issue_comments
    for comment in utils.fetch_issue_comments(repo, token, issue):
  File "/home/runner/work/github-to-sqlite/github-to-sqlite/github_to_sqlite/utils.py", line 380, in fetch_issue_comments
    for comments in paginate(url, headers):
  File "/home/runner/work/github-to-sqlite/github-to-sqlite/github_to_sqlite/utils.py", line 472, in paginate
    raise GitHubError.from_response(response)
github_to_sqlite.utils.GitHubError: ('API rate limit exceeded for user ID 9599.', 403)
Error: Process completed with exit code 1.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
github-to-sqlite should handle rate limits better 703246031  
735483820 https://github.com/dogsheep/github-to-sqlite/issues/46#issuecomment-735483820 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/46 MDEyOklzc3VlQ29tbWVudDczNTQ4MzgyMA== simonw 9599 2020-11-30T00:27:47Z 2020-11-30T00:27:47Z MEMBER

So it looks like anything that pulls reviews needs to pull each review, then for each one pull the comments.

I'm going to consider this blocked on smarter rate limit handling in #51.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Feature: pull request reviews and comments 664485022  
735483604 https://github.com/dogsheep/github-to-sqlite/issues/46#issuecomment-735483604 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/46 MDEyOklzc3VlQ29tbWVudDczNTQ4MzYwNA== simonw 9599 2020-11-30T00:26:50Z 2020-11-30T00:26:50Z MEMBER

It seems like there's a lot missing from that - those aren't particularly interesting given the data that is returned.

From the docs at https://docs.github.com/en/free-pro-team@latest/rest/reference/pulls#reviews it looks like each review consists of multiple comments, and the comments are where the useful material is - https://docs.github.com/en/free-pro-team@latest/rest/reference/pulls#list-comments-for-a-pull-request-review

github-to-sqlite get https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/48/reviews/503368921/comments --accept 'application/vnd.github.v3+json'

[
    {
        "id": 500603838,
        "node_id": "MDI0OlB1bGxSZXF1ZXN0UmV2aWV3Q29tbWVudDUwMDYwMzgzOA==",
        "url": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/comments/500603838",
        "pull_request_review_id": 503368921,
        "diff_hunk": "@@ -0,0 +1,370 @@\n+[\n+    {\n+        \"url\": \"https://api.github.com/repos/simonw/datasette/pulls/571\",\n+        \"id\": 313384926,\n+        \"node_id\": \"MDExOlB1bGxSZXF1ZXN0MzEzMzg0OTI2\",\n+        \"html_url\": \"https://github.com/simonw/datasette/pull/571\",\n+        \"diff_url\": \"https://github.com/simonw/datasette/pull/571.diff\",\n+        \"patch_url\": \"https://github.com/simonw/datasette/pull/571.patch\",\n+        \"issue_url\": \"https://api.github.com/repos/simonw/datasette/issues/571\",\n+        \"number\": 571,\n+        \"state\": \"closed\",\n+        \"locked\": false,\n+        \"title\": \"detect_fts now works with alternative table escaping\",\n+        \"user\": {\n+          \"login\": \"simonw\",\n+          \"id\": 9599,\n+          \"node_id\": \"MDQ6VXNlcjk1OTk=\",\n+          \"avatar_url\": \"https://avatars0.githubusercontent.com/u/9599?v=4\",\n+          \"gravatar_id\": \"\",\n+          \"url\": \"https://api.github.com/users/simonw\",\n+          \"html_url\": \"https://github.com/simonw\",\n+          \"followers_url\": \"https://api.github.com/users/simonw/followers\",\n+          \"following_url\": \"https://api.github.com/users/simonw/following{/other_user}\",\n+          \"gists_url\": \"https://api.github.com/users/simonw/gists{/gist_id}\",\n+          \"starred_url\": \"https://api.github.com/users/simonw/starred{/owner}{/repo}\",\n+          \"subscriptions_url\": \"https://api.github.com/users/simonw/subscriptions\",\n+          \"organizations_url\": \"https://api.github.com/users/simonw/orgs\",\n+          \"repos_url\": \"https://api.github.com/users/simonw/repos\",\n+          \"events_url\": \"https://api.github.com/users/simonw/events{/privacy}\",\n+          \"received_events_url\": \"https://api.github.com/users/simonw/received_events\",\n+          \"type\": \"User\",\n+          \"site_admin\": false\n+        },\n+        \"body\": \"Fixes #570\",\n+        \"created_at\": \"2019-09-03T00:23:39Z\",\n+        \"updated_at\": \"2019-09-03T00:32:28Z\",\n+        \"closed_at\": \"2019-09-03T00:32:28Z\",\n+        \"merged_at\": \"2019-09-03T00:32:28Z\",\n+        \"merge_commit_sha\": \"2dc5c8dc259a0606162673d394ba8cc1c6f54428\",\n+        \"assignee\": null,\n+        \"assignees\": [\n+\n+        ],\n+        \"requested_reviewers\": [\n+\n+        ],\n+        \"requested_teams\": [\n+\n+        ],\n+        \"labels\": [\n+\n+        ],\n+        \"milestone\": null,\n+        \"draft\": false,\n+        \"commits_url\": \"https://api.github.com/repos/simonw/datasette/pulls/571/commits\",\n+        \"review_comments_url\": \"https://api.github.com/repos/simonw/datasette/pulls/571/comments\",\n+        \"review_comment_url\": \"https://api.github.com/repos/simonw/datasette/pulls/comments{/number}\",\n+        \"comments_url\": \"https://api.github.com/repos/simonw/datasette/issues/571/comments\",\n+        \"statuses_url\": \"https://api.github.com/repos/simonw/datasette/statuses/a85239f69261c10f1a9f90514c8b5d113cb94585\",\n+        \"head\": {\n+          \"label\": \"simonw:detect-fts\",\n+          \"ref\": \"detect-fts\",\n+          \"sha\": \"a85239f69261c10f1a9f90514c8b5d113cb94585\",\n+          \"user\": {\n+            \"login\": \"simonw\",\n+            \"id\": 9599,\n+            \"node_id\": \"MDQ6VXNlcjk1OTk=\",\n+            \"avatar_url\": \"https://avatars0.githubusercontent.com/u/9599?v=4\",\n+            \"gravatar_id\": \"\",\n+            \"url\": \"https://api.github.com/users/simonw\",\n+            \"html_url\": \"https://github.com/simonw\",\n+            \"followers_url\": \"https://api.github.com/users/simonw/followers\",\n+            \"following_url\": \"https://api.github.com/users/simonw/following{/other_user}\",\n+            \"gists_url\": \"https://api.github.com/users/simonw/gists{/gist_id}\",\n+            \"starred_url\": \"https://api.github.com/users/simonw/starred{/owner}{/repo}\",\n+            \"subscriptions_url\": \"https://api.github.com/users/simonw/subscriptions\",\n+            \"organizations_url\": \"https://api.github.com/users/simonw/orgs\",\n+            \"repos_url\": \"https://api.github.com/users/simonw/repos\",\n+            \"events_url\": \"https://api.github.com/users/simonw/events{/privacy}\",\n+            \"received_events_url\": \"https://api.github.com/users/simonw/received_events\",\n+            \"type\": \"User\",\n+            \"site_admin\": false\n+          },\n+          \"repo\": {\n+            \"id\": 107914493,\n+            \"node_id\": \"MDEwOlJlcG9zaXRvcnkxMDc5MTQ0OTM=\",\n+            \"name\": \"datasette\",\n+            \"full_name\": \"simonw/datasette\",\n+            \"private\": false,\n+            \"owner\": {\n+              \"login\": \"simonw\",\n+              \"id\": 9599,\n+              \"node_id\": \"MDQ6VXNlcjk1OTk=\",\n+              \"avatar_url\": \"https://avatars0.githubusercontent.com/u/9599?v=4\",\n+              \"gravatar_id\": \"\",\n+              \"url\": \"https://api.github.com/users/simonw\",\n+              \"html_url\": \"https://github.com/simonw\",\n+              \"followers_url\": \"https://api.github.com/users/simonw/followers\",\n+              \"following_url\": \"https://api.github.com/users/simonw/following{/other_user}\",\n+              \"gists_url\": \"https://api.github.com/users/simonw/gists{/gist_id}\",\n+              \"starred_url\": \"https://api.github.com/users/simonw/starred{/owner}{/repo}\",\n+              \"subscriptions_url\": \"https://api.github.com/users/simonw/subscriptions\",\n+              \"organizations_url\": \"https://api.github.com/users/simonw/orgs\",\n+              \"repos_url\": \"https://api.github.com/users/simonw/repos\",\n+              \"events_url\": \"https://api.github.com/users/simonw/events{/privacy}\",\n+              \"received_events_url\": \"https://api.github.com/users/simonw/received_events\",\n+              \"type\": \"User\",\n+              \"site_admin\": false\n+            },\n+            \"html_url\": \"https://github.com/simonw/datasette\",\n+            \"description\": \"An open source multi-tool for exploring and publishing data\",\n+            \"fork\": false,\n+            \"url\": \"https://api.github.com/repos/simonw/datasette\",\n+            \"forks_url\": \"https://api.github.com/repos/simonw/datasette/forks\",\n+            \"keys_url\": \"https://api.github.com/repos/simonw/datasette/keys{/key_id}\",\n+            \"collaborators_url\": \"https://api.github.com/repos/simonw/datasette/collaborators{/collaborator}\",\n+            \"teams_url\": \"https://api.github.com/repos/simonw/datasette/teams\",\n+            \"hooks_url\": \"https://api.github.com/repos/simonw/datasette/hooks\",\n+            \"issue_events_url\": \"https://api.github.com/repos/simonw/datasette/issues/events{/number}\",\n+            \"events_url\": \"https://api.github.com/repos/simonw/datasette/events\",\n+            \"assignees_url\": \"https://api.github.com/repos/simonw/datasette/assignees{/user}\",\n+            \"branches_url\": \"https://api.github.com/repos/simonw/datasette/branches{/branch}\",\n+            \"tags_url\": \"https://api.github.com/repos/simonw/datasette/tags\",\n+            \"blobs_url\": \"https://api.github.com/repos/simonw/datasette/git/blobs{/sha}\",\n+            \"git_tags_url\": \"https://api.github.com/repos/simonw/datasette/git/tags{/sha}\",\n+            \"git_refs_url\": \"https://api.github.com/repos/simonw/datasette/git/refs{/sha}\",\n+            \"trees_url\": \"https://api.github.com/repos/simonw/datasette/git/trees{/sha}\",\n+            \"statuses_url\": \"https://api.github.com/repos/simonw/datasette/statuses/{sha}\",\n+            \"languages_url\": \"https://api.github.com/repos/simonw/datasette/languages\",\n+            \"stargazers_url\": \"https://api.github.com/repos/simonw/datasette/stargazers\",\n+            \"contributors_url\": \"https://api.github.com/repos/simonw/datasette/contributors\",\n+            \"subscribers_url\": \"https://api.github.com/repos/simonw/datasette/subscribers\",\n+            \"subscription_url\": \"https://api.github.com/repos/simonw/datasette/subscription\",\n+            \"commits_url\": \"https://api.github.com/repos/simonw/datasette/commits{/sha}\",\n+            \"git_commits_url\": \"https://api.github.com/repos/simonw/datasette/git/commits{/sha}\",\n+            \"comments_url\": \"https://api.github.com/repos/simonw/datasette/comments{/number}\",\n+            \"issue_comment_url\": \"https://api.github.com/repos/simonw/datasette/issues/comments{/number}\",\n+            \"contents_url\": \"https://api.github.com/repos/simonw/datasette/contents/{+path}\",\n+            \"compare_url\": \"https://api.github.com/repos/simonw/datasette/compare/{base}...{head}\",\n+            \"merges_url\": \"https://api.github.com/repos/simonw/datasette/merges\",\n+            \"archive_url\": \"https://api.github.com/repos/simonw/datasette/{archive_format}{/ref}\",\n+            \"downloads_url\": \"https://api.github.com/repos/simonw/datasette/downloads\",\n+            \"issues_url\": \"https://api.github.com/repos/simonw/datasette/issues{/number}\",\n+            \"pulls_url\": \"https://api.github.com/repos/simonw/datasette/pulls{/number}\",\n+            \"milestones_url\": \"https://api.github.com/repos/simonw/datasette/milestones{/number}\",\n+            \"notifications_url\": \"https://api.github.com/repos/simonw/datasette/notifications{?since,all,participating}\",\n+            \"labels_url\": \"https://api.github.com/repos/simonw/datasette/labels{/name}\",\n+            \"releases_url\": \"https://api.github.com/repos/simonw/datasette/releases{/id}\",\n+            \"deployments_url\": \"https://api.github.com/repos/simonw/datasette/deployments\",\n+            \"created_at\": \"2017-10-23T00:39:03Z\",\n+            \"updated_at\": \"2020-07-27T20:42:15Z\",\n+            \"pushed_at\": \"2020-07-26T01:21:05Z\",\n+            \"git_url\": \"git://github.com/simonw/datasette.git\",\n+            \"ssh_url\": \"git@github.com:simonw/datasette.git\",\n+            \"clone_url\": \"https://github.com/simonw/datasette.git\",\n+            \"svn_url\": \"https://github.com/simonw/datasette\",\n+            \"homepage\": \"http://datasette.readthedocs.io/\",\n+            \"size\": 3487,\n+            \"stargazers_count\": 3642,\n+            \"watchers_count\": 3642,\n+            \"language\": \"Python\",\n+            \"has_issues\": true,\n+            \"has_projects\": false,\n+            \"has_downloads\": true,\n+            \"has_wiki\": true,\n+            \"has_pages\": false,\n+            \"forks_count\": 206,\n+            \"mirror_url\": null,\n+            \"archived\": false,\n+            \"disabled\": false,\n+            \"open_issues_count\": 190,\n+            \"license\": {\n+              \"key\": \"apache-2.0\",\n+              \"name\": \"Apache License 2.0\",\n+              \"spdx_id\": \"Apache-2.0\",\n+              \"url\": \"https://api.github.com/licenses/apache-2.0\",\n+              \"node_id\": \"MDc6TGljZW5zZTI=\"\n+            },\n+            \"forks\": 206,\n+            \"open_issues\": 190,\n+            \"watchers\": 3642,\n+            \"default_branch\": \"master\"\n+          }\n+        },\n+        \"base\": {\n+          \"label\": \"simonw:master\",\n+          \"ref\": \"master\",\n+          \"sha\": \"f04deebec4f3842f7bd610cd5859de529f77d50e\",\n+          \"user\": {\n+            \"login\": \"simonw\",\n+            \"id\": 9599,\n+            \"node_id\": \"MDQ6VXNlcjk1OTk=\",\n+            \"avatar_url\": \"https://avatars0.githubusercontent.com/u/9599?v=4\",\n+            \"gravatar_id\": \"\",\n+            \"url\": \"https://api.github.com/users/simonw\",\n+            \"html_url\": \"https://github.com/simonw\",\n+            \"followers_url\": \"https://api.github.com/users/simonw/followers\",\n+            \"following_url\": \"https://api.github.com/users/simonw/following{/other_user}\",\n+            \"gists_url\": \"https://api.github.com/users/simonw/gists{/gist_id}\",\n+            \"starred_url\": \"https://api.github.com/users/simonw/starred{/owner}{/repo}\",\n+            \"subscriptions_url\": \"https://api.github.com/users/simonw/subscriptions\",\n+            \"organizations_url\": \"https://api.github.com/users/simonw/orgs\",\n+            \"repos_url\": \"https://api.github.com/users/simonw/repos\",\n+            \"events_url\": \"https://api.github.com/users/simonw/events{/privacy}\",\n+            \"received_events_url\": \"https://api.github.com/users/simonw/received_events\",\n+            \"type\": \"User\",\n+            \"site_admin\": false\n+          },\n+          \"repo\": {\n+            \"id\": 107914493,\n+            \"node_id\": \"MDEwOlJlcG9zaXRvcnkxMDc5MTQ0OTM=\",\n+            \"name\": \"datasette\",\n+            \"full_name\": \"simonw/datasette\",\n+            \"private\": false,\n+            \"owner\": {\n+              \"login\": \"simonw\",\n+              \"id\": 9599,\n+              \"node_id\": \"MDQ6VXNlcjk1OTk=\",\n+              \"avatar_url\": \"https://avatars0.githubusercontent.com/u/9599?v=4\",\n+              \"gravatar_id\": \"\",\n+              \"url\": \"https://api.github.com/users/simonw\",\n+              \"html_url\": \"https://github.com/simonw\",\n+              \"followers_url\": \"https://api.github.com/users/simonw/followers\",\n+              \"following_url\": \"https://api.github.com/users/simonw/following{/other_user}\",\n+              \"gists_url\": \"https://api.github.com/users/simonw/gists{/gist_id}\",\n+              \"starred_url\": \"https://api.github.com/users/simonw/starred{/owner}{/repo}\",\n+              \"subscriptions_url\": \"https://api.github.com/users/simonw/subscriptions\",\n+              \"organizations_url\": \"https://api.github.com/users/simonw/orgs\",\n+              \"repos_url\": \"https://api.github.com/users/simonw/repos\",\n+              \"events_url\": \"https://api.github.com/users/simonw/events{/privacy}\",\n+              \"received_events_url\": \"https://api.github.com/users/simonw/received_events\",\n+              \"type\": \"User\",\n+              \"site_admin\": false\n+            },\n+            \"html_url\": \"https://github.com/simonw/datasette\",\n+            \"description\": \"An open source multi-tool for exploring and publishing data\",\n+            \"fork\": false,\n+            \"url\": \"https://api.github.com/repos/simonw/datasette\",\n+            \"forks_url\": \"https://api.github.com/repos/simonw/datasette/forks\",\n+            \"keys_url\": \"https://api.github.com/repos/simonw/datasette/keys{/key_id}\",\n+            \"collaborators_url\": \"https://api.github.com/repos/simonw/datasette/collaborators{/collaborator}\",\n+            \"teams_url\": \"https://api.github.com/repos/simonw/datasette/teams\",\n+            \"hooks_url\": \"https://api.github.com/repos/simonw/datasette/hooks\",\n+            \"issue_events_url\": \"https://api.github.com/repos/simonw/datasette/issues/events{/number}\",\n+            \"events_url\": \"https://api.github.com/repos/simonw/datasette/events\",\n+            \"assignees_url\": \"https://api.github.com/repos/simonw/datasette/assignees{/user}\",\n+            \"branches_url\": \"https://api.github.com/repos/simonw/datasette/branches{/branch}\",\n+            \"tags_url\": \"https://api.github.com/repos/simonw/datasette/tags\",\n+            \"blobs_url\": \"https://api.github.com/repos/simonw/datasette/git/blobs{/sha}\",\n+            \"git_tags_url\": \"https://api.github.com/repos/simonw/datasette/git/tags{/sha}\",\n+            \"git_refs_url\": \"https://api.github.com/repos/simonw/datasette/git/refs{/sha}\",\n+            \"trees_url\": \"https://api.github.com/repos/simonw/datasette/git/trees{/sha}\",\n+            \"statuses_url\": \"https://api.github.com/repos/simonw/datasette/statuses/{sha}\",\n+            \"languages_url\": \"https://api.github.com/repos/simonw/datasette/languages\",\n+            \"stargazers_url\": \"https://api.github.com/repos/simonw/datasette/stargazers\",\n+            \"contributors_url\": \"https://api.github.com/repos/simonw/datasette/contributors\",\n+            \"subscribers_url\": \"https://api.github.com/repos/simonw/datasette/subscribers\",\n+            \"subscription_url\": \"https://api.github.com/repos/simonw/datasette/subscription\",\n+            \"commits_url\": \"https://api.github.com/repos/simonw/datasette/commits{/sha}\",\n+            \"git_commits_url\": \"https://api.github.com/repos/simonw/datasette/git/commits{/sha}\",\n+            \"comments_url\": \"https://api.github.com/repos/simonw/datasette/comments{/number}\",\n+            \"issue_comment_url\": \"https://api.github.com/repos/simonw/datasette/issues/comments{/number}\",\n+            \"contents_url\": \"https://api.github.com/repos/simonw/datasette/contents/{+path}\",\n+            \"compare_url\": \"https://api.github.com/repos/simonw/datasette/compare/{base}...{head}\",\n+            \"merges_url\": \"https://api.github.com/repos/simonw/datasette/merges\",\n+            \"archive_url\": \"https://api.github.com/repos/simonw/datasette/{archive_format}{/ref}\",\n+            \"downloads_url\": \"https://api.github.com/repos/simonw/datasette/downloads\",\n+            \"issues_url\": \"https://api.github.com/repos/simonw/datasette/issues{/number}\",\n+            \"pulls_url\": \"https://api.github.com/repos/simonw/datasette/pulls{/number}\",\n+            \"milestones_url\": \"https://api.github.com/repos/simonw/datasette/milestones{/number}\",\n+            \"notifications_url\": \"https://api.github.com/repos/simonw/datasette/notifications{?since,all,participating}\",\n+            \"labels_url\": \"https://api.github.com/repos/simonw/datasette/labels{/name}\",\n+            \"releases_url\": \"https://api.github.com/repos/simonw/datasette/releases{/id}\",\n+            \"deployments_url\": \"https://api.github.com/repos/simonw/datasette/deployments\",\n+            \"created_at\": \"2017-10-23T00:39:03Z\",\n+            \"updated_at\": \"2020-07-27T20:42:15Z\",\n+            \"pushed_at\": \"2020-07-26T01:21:05Z\",\n+            \"git_url\": \"git://github.com/simonw/datasette.git\",\n+            \"ssh_url\": \"git@github.com:simonw/datasette.git\",\n+            \"clone_url\": \"https://github.com/simonw/datasette.git\",\n+            \"svn_url\": \"https://github.com/simonw/datasette\",\n+            \"homepage\": \"http://datasette.readthedocs.io/\",\n+            \"size\": 3487,\n+            \"stargazers_count\": 3642,\n+            \"watchers_count\": 3642,\n+            \"language\": \"Python\",\n+            \"has_issues\": true,\n+            \"has_projects\": false,\n+            \"has_downloads\": true,\n+            \"has_wiki\": true,\n+            \"has_pages\": false,\n+            \"forks_count\": 206,\n+            \"mirror_url\": null,\n+            \"archived\": false,\n+            \"disabled\": false,\n+            \"open_issues_count\": 190,\n+            \"license\": {\n+              \"key\": \"apache-2.0\",\n+              \"name\": \"Apache License 2.0\",\n+              \"spdx_id\": \"Apache-2.0\",\n+              \"url\": \"https://api.github.com/licenses/apache-2.0\",\n+              \"node_id\": \"MDc6TGljZW5zZTI=\"\n+            },\n+            \"forks\": 206,\n+            \"open_issues\": 190,\n+            \"watchers\": 3642,\n+            \"default_branch\": \"master\"\n+          }\n+        },\n+        \"_links\": {\n+          \"self\": {\n+            \"href\": \"https://api.github.com/repos/simonw/datasette/pulls/571\"\n+          },\n+          \"html\": {\n+            \"href\": \"https://github.com/simonw/datasette/pull/571\"\n+          },\n+          \"issue\": {\n+            \"href\": \"https://api.github.com/repos/simonw/datasette/issues/571\"\n+          },\n+          \"comments\": {\n+            \"href\": \"https://api.github.com/repos/simonw/datasette/issues/571/comments\"\n+          },\n+          \"review_comments\": {\n+            \"href\": \"https://api.github.com/repos/simonw/datasette/pulls/571/comments\"\n+          },\n+          \"review_comment\": {\n+            \"href\": \"https://api.github.com/repos/simonw/datasette/pulls/comments{/number}\"\n+          },\n+          \"commits\": {\n+            \"href\": \"https://api.github.com/repos/simonw/datasette/pulls/571/commits\"\n+          },\n+          \"statuses\": {\n+            \"href\": \"https://api.github.com/repos/simonw/datasette/statuses/a85239f69261c10f1a9f90514c8b5d113cb94585\"\n+          }\n+        },\n+        \"author_association\": \"OWNER\",\n+        \"active_lock_reason\": null,\n+        \"merged\": true,\n+        \"mergeable\": null,\n+        \"rebaseable\": null,\n+        \"mergeable_state\": \"unknown\",\n+        \"merged_by\": {",
        "path": "tests/pull_requests.json",
        "position": 342,
        "original_position": 342,
        "commit_id": "3a0d5c498f9faae4e40aab204cd01b965a4f61f3",
        "user": {
            "login": "simonw",
            "id": 9599,
            "node_id": "MDQ6VXNlcjk1OTk=",
            "avatar_url": "https://avatars0.githubusercontent.com/u/9599?u=5968723deb1a55b82620e106f5ca58e9b11a0942&v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/simonw",
            "html_url": "https://github.com/simonw",
            "followers_url": "https://api.github.com/users/simonw/followers",
            "following_url": "https://api.github.com/users/simonw/following{/other_user}",
            "gists_url": "https://api.github.com/users/simonw/gists{/gist_id}",
            "starred_url": "https://api.github.com/users/simonw/starred{/owner}{/repo}",
            "subscriptions_url": "https://api.github.com/users/simonw/subscriptions",
            "organizations_url": "https://api.github.com/users/simonw/orgs",
            "repos_url": "https://api.github.com/users/simonw/repos",
            "events_url": "https://api.github.com/users/simonw/events{/privacy}",
            "received_events_url": "https://api.github.com/users/simonw/received_events",
            "type": "User",
            "site_admin": false
        },
        "body": "Running this should create a `merged_by` column on the `pull_requests` table which is a foreign key to the `users` table.",
        "created_at": "2020-10-06T21:22:47Z",
        "updated_at": "2020-10-20T20:56:33Z",
        "html_url": "https://github.com/dogsheep/github-to-sqlite/pull/48#discussion_r500603838",
        "pull_request_url": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/48",
        "author_association": "MEMBER",
        "_links": {
            "self": {
                "href": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/comments/500603838"
            },
            "html": {
                "href": "https://github.com/dogsheep/github-to-sqlite/pull/48#discussion_r500603838"
            },
            "pull_request": {
                "href": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/48"
            }
        },
        "original_commit_id": "4f33b850bd37829262dd29e1c520afffebedc19c"
    },
    {
        "id": 500606198,
        "node_id": "MDI0OlB1bGxSZXF1ZXN0UmV2aWV3Q29tbWVudDUwMDYwNjE5OA==",
        "url": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/comments/500606198",
        "pull_request_review_id": 503368921,
        "diff_hunk": "@@ -0,0 +1,124 @@\n+from github_to_sqlite import utils\n+import pytest\n+import pathlib\n+import sqlite_utils\n+from sqlite_utils.db import ForeignKey\n+import json\n+\n+\n+@pytest.fixture\n+def pull_requests():\n+    return json.load(open(pathlib.Path(__file__).parent / \"pull_requests.json\"))\n+\n+\n+@pytest.fixture\n+def db(pull_requests):\n+    db = sqlite_utils.Database(memory=True)\n+    db[\"repos\"].insert(\n+        {\"id\": 1},\n+        pk=\"id\",\n+        columns={\"organization\": int, \"topics\": str, \"name\": str, \"description\": str},\n+    )\n+    utils.save_pull_requests(db, pull_requests, {\"id\": 1})\n+    return db\n+\n+\n+def test_tables(db):\n+    assert {\"pull_requests\", \"users\", \"repos\", \"milestones\"} == set(\n+        db.table_names()\n+    )\n+    assert {\n+        ForeignKey(\n+            table=\"pull_requests\", column=\"repo\", other_table=\"repos\", other_column=\"id\"\n+        ),\n+        ForeignKey(\n+            table=\"pull_requests\",\n+            column=\"milestone\",\n+            other_table=\"milestones\",\n+            other_column=\"id\",\n+        ),\n+        ForeignKey(\n+            table=\"pull_requests\", column=\"assignee\", other_table=\"users\", other_column=\"id\"\n+        ),\n+        ForeignKey(\n+            table=\"pull_requests\", column=\"user\", other_table=\"users\", other_column=\"id\"\n+        ),\n+    } == set(db[\"pull_requests\"].foreign_keys)\n+\n+\n+def test_pull_requests(db):\n+    pull_request_rows = list(db[\"pull_requests\"].rows)\n+    assert [\n+        {\n+            'id': 313384926,\n+            'node_id': 'MDExOlB1bGxSZXF1ZXN0MzEzMzg0OTI2',\n+            'number': 571,\n+            'state': 'closed',\n+            'locked': 0,\n+            'title': 'detect_fts now works with alternative table escaping',\n+            'user': 9599,\n+            'body': 'Fixes #570',\n+            'created_at': '2019-09-03T00:23:39Z',\n+            'updated_at': '2019-09-03T00:32:28Z',\n+            'closed_at': '2019-09-03T00:32:28Z',\n+            'merged_at': '2019-09-03T00:32:28Z',\n+            'merge_commit_sha': '2dc5c8dc259a0606162673d394ba8cc1c6f54428',\n+            'assignee': None,\n+            'milestone': None,\n+            'draft': 0,\n+            'head': 'a85239f69261c10f1a9f90514c8b5d113cb94585',\n+            'base': 'f04deebec4f3842f7bd610cd5859de529f77d50e',\n+            'author_association': 'OWNER',\n+            'merged': 1,\n+            'mergeable': None,\n+            'rebaseable': None,\n+            'mergeable_state': 'unknown',\n+            'merged_by': '{\"login\": \"simonw\", \"id\": 9599, \"node_id\": \"MDQ6VXNlcjk1OTk=\", \"avatar_url\": \"https://avatars0.githubusercontent.com/u/9599?v=4\", \"gravatar_id\": \"\", \"url\": \"https://api.github.com/users/simonw\", \"html_url\": \"https://github.com/simonw\", \"followers_url\": \"https://api.github.com/users/simonw/followers\", \"following_url\": \"https://api.github.com/users/simonw/following{/other_user}\", \"gists_url\": \"https://api.github.com/users/simonw/gists{/gist_id}\", \"starred_url\": \"https://api.github.com/users/simonw/starred{/owner}{/repo}\", \"subscriptions_url\": \"https://api.github.com/users/simonw/subscriptions\", \"organizations_url\": \"https://api.github.com/users/simonw/orgs\", \"repos_url\": \"https://api.github.com/users/simonw/repos\", \"events_url\": \"https://api.github.com/users/simonw/events{/privacy}\", \"received_events_url\": \"https://api.github.com/users/simonw/received_events\", \"type\": \"User\", \"site_admin\": false}',",
        "path": "tests/test_pull_requests.py",
        "position": null,
        "original_position": 76,
        "commit_id": "3a0d5c498f9faae4e40aab204cd01b965a4f61f3",
        "user": {
            "login": "simonw",
            "id": 9599,
            "node_id": "MDQ6VXNlcjk1OTk=",
            "avatar_url": "https://avatars0.githubusercontent.com/u/9599?u=5968723deb1a55b82620e106f5ca58e9b11a0942&v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/simonw",
            "html_url": "https://github.com/simonw",
            "followers_url": "https://api.github.com/users/simonw/followers",
            "following_url": "https://api.github.com/users/simonw/following{/other_user}",
            "gists_url": "https://api.github.com/users/simonw/gists{/gist_id}",
            "starred_url": "https://api.github.com/users/simonw/starred{/owner}{/repo}",
            "subscriptions_url": "https://api.github.com/users/simonw/subscriptions",
            "organizations_url": "https://api.github.com/users/simonw/orgs",
            "repos_url": "https://api.github.com/users/simonw/repos",
            "events_url": "https://api.github.com/users/simonw/events{/privacy}",
            "received_events_url": "https://api.github.com/users/simonw/received_events",
            "type": "User",
            "site_admin": false
        },
        "body": "See above - this should be 9599, an integer reference to the row in the users table.",
        "created_at": "2020-10-06T21:27:43Z",
        "updated_at": "2020-10-20T20:56:33Z",
        "html_url": "https://github.com/dogsheep/github-to-sqlite/pull/48#discussion_r500606198",
        "pull_request_url": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/48",
        "author_association": "MEMBER",
        "_links": {
            "self": {
                "href": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/comments/500606198"
            },
            "html": {
                "href": "https://github.com/dogsheep/github-to-sqlite/pull/48#discussion_r500606198"
            },
            "pull_request": {
                "href": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/48"
            }
        },
        "original_commit_id": "4f33b850bd37829262dd29e1c520afffebedc19c"
    },
    {
        "id": 500606665,
        "node_id": "MDI0OlB1bGxSZXF1ZXN0UmV2aWV3Q29tbWVudDUwMDYwNjY2NQ==",
        "url": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/comments/500606665",
        "pull_request_review_id": 503368921,
        "diff_hunk": "@@ -0,0 +1,124 @@\n+from github_to_sqlite import utils\n+import pytest\n+import pathlib\n+import sqlite_utils\n+from sqlite_utils.db import ForeignKey\n+import json\n+\n+\n+@pytest.fixture\n+def pull_requests():\n+    return json.load(open(pathlib.Path(__file__).parent / \"pull_requests.json\"))\n+\n+\n+@pytest.fixture\n+def db(pull_requests):\n+    db = sqlite_utils.Database(memory=True)\n+    db[\"repos\"].insert(\n+        {\"id\": 1},\n+        pk=\"id\",\n+        columns={\"organization\": int, \"topics\": str, \"name\": str, \"description\": str},\n+    )\n+    utils.save_pull_requests(db, pull_requests, {\"id\": 1})\n+    return db\n+\n+\n+def test_tables(db):\n+    assert {\"pull_requests\", \"users\", \"repos\", \"milestones\"} == set(\n+        db.table_names()\n+    )\n+    assert {\n+        ForeignKey(\n+            table=\"pull_requests\", column=\"repo\", other_table=\"repos\", other_column=\"id\"\n+        ),\n+        ForeignKey(\n+            table=\"pull_requests\",\n+            column=\"milestone\",\n+            other_table=\"milestones\",\n+            other_column=\"id\",\n+        ),\n+        ForeignKey(\n+            table=\"pull_requests\", column=\"assignee\", other_table=\"users\", other_column=\"id\"\n+        ),\n+        ForeignKey(\n+            table=\"pull_requests\", column=\"user\", other_table=\"users\", other_column=\"id\"\n+        ),\n+    } == set(db[\"pull_requests\"].foreign_keys)\n+\n+\n+def test_pull_requests(db):\n+    pull_request_rows = list(db[\"pull_requests\"].rows)\n+    assert [\n+        {\n+            'id': 313384926,",
        "path": "tests/test_pull_requests.py",
        "position": null,
        "original_position": 53,
        "commit_id": "3a0d5c498f9faae4e40aab204cd01b965a4f61f3",
        "user": {
            "login": "simonw",
            "id": 9599,
            "node_id": "MDQ6VXNlcjk1OTk=",
            "avatar_url": "https://avatars0.githubusercontent.com/u/9599?u=5968723deb1a55b82620e106f5ca58e9b11a0942&v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/simonw",
            "html_url": "https://github.com/simonw",
            "followers_url": "https://api.github.com/users/simonw/followers",
            "following_url": "https://api.github.com/users/simonw/following{/other_user}",
            "gists_url": "https://api.github.com/users/simonw/gists{/gist_id}",
            "starred_url": "https://api.github.com/users/simonw/starred{/owner}{/repo}",
            "subscriptions_url": "https://api.github.com/users/simonw/subscriptions",
            "organizations_url": "https://api.github.com/users/simonw/orgs",
            "repos_url": "https://api.github.com/users/simonw/repos",
            "events_url": "https://api.github.com/users/simonw/events{/privacy}",
            "received_events_url": "https://api.github.com/users/simonw/received_events",
            "type": "User",
            "site_admin": false
        },
        "body": "Minor detail: I use Black for this repo, which requires double quotes - running \"black .\" in the root directory (with the latest version of Black) should handle this for you.",
        "created_at": "2020-10-06T21:28:31Z",
        "updated_at": "2020-10-20T20:56:33Z",
        "html_url": "https://github.com/dogsheep/github-to-sqlite/pull/48#discussion_r500606665",
        "pull_request_url": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/48",
        "author_association": "MEMBER",
        "_links": {
            "self": {
                "href": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/comments/500606665"
            },
            "html": {
                "href": "https://github.com/dogsheep/github-to-sqlite/pull/48#discussion_r500606665"
            },
            "pull_request": {
                "href": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/48"
            }
        },
        "original_commit_id": "4f33b850bd37829262dd29e1c520afffebedc19c"
    }
]

That's a lot more interesting.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Feature: pull request reviews and comments 664485022  
735482546 https://github.com/dogsheep/github-to-sqlite/issues/46#issuecomment-735482546 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/46 MDEyOklzc3VlQ29tbWVudDczNTQ4MjU0Ng== simonw 9599 2020-11-30T00:22:02Z 2020-11-30T00:22:02Z MEMBER

As for reviews... here's the output of github-to-sqlite get https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/48/reviews --accept 'application/vnd.github.v3+json'

[
    {
        "id": 503368921,
        "node_id": "MDE3OlB1bGxSZXF1ZXN0UmV2aWV3NTAzMzY4OTIx",
        "user": {
            "login": "simonw",
            "id": 9599,
            "node_id": "MDQ6VXNlcjk1OTk=",
            "avatar_url": "https://avatars0.githubusercontent.com/u/9599?u=5968723deb1a55b82620e106f5ca58e9b11a0942&v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/simonw",
            "html_url": "https://github.com/simonw",
            "followers_url": "https://api.github.com/users/simonw/followers",
            "following_url": "https://api.github.com/users/simonw/following{/other_user}",
            "gists_url": "https://api.github.com/users/simonw/gists{/gist_id}",
            "starred_url": "https://api.github.com/users/simonw/starred{/owner}{/repo}",
            "subscriptions_url": "https://api.github.com/users/simonw/subscriptions",
            "organizations_url": "https://api.github.com/users/simonw/orgs",
            "repos_url": "https://api.github.com/users/simonw/repos",
            "events_url": "https://api.github.com/users/simonw/events{/privacy}",
            "received_events_url": "https://api.github.com/users/simonw/received_events",
            "type": "User",
            "site_admin": false
        },
        "body": "",
        "state": "CHANGES_REQUESTED",
        "html_url": "https://github.com/dogsheep/github-to-sqlite/pull/48#pullrequestreview-503368921",
        "pull_request_url": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/48",
        "author_association": "MEMBER",
        "_links": {
            "html": {
                "href": "https://github.com/dogsheep/github-to-sqlite/pull/48#pullrequestreview-503368921"
            },
            "pull_request": {
                "href": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/48"
            }
        },
        "submitted_at": "2020-10-06T21:28:40Z",
        "commit_id": "4f33b850bd37829262dd29e1c520afffebedc19c"
    },
    {
        "id": 513118561,
        "node_id": "MDE3OlB1bGxSZXF1ZXN0UmV2aWV3NTEzMTE4NTYx",
        "user": {
            "login": "adamjonas",
            "id": 755825,
            "node_id": "MDQ6VXNlcjc1NTgyNQ==",
            "avatar_url": "https://avatars1.githubusercontent.com/u/755825?v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/adamjonas",
            "html_url": "https://github.com/adamjonas",
            "followers_url": "https://api.github.com/users/adamjonas/followers",
            "following_url": "https://api.github.com/users/adamjonas/following{/other_user}",
            "gists_url": "https://api.github.com/users/adamjonas/gists{/gist_id}",
            "starred_url": "https://api.github.com/users/adamjonas/starred{/owner}{/repo}",
            "subscriptions_url": "https://api.github.com/users/adamjonas/subscriptions",
            "organizations_url": "https://api.github.com/users/adamjonas/orgs",
            "repos_url": "https://api.github.com/users/adamjonas/repos",
            "events_url": "https://api.github.com/users/adamjonas/events{/privacy}",
            "received_events_url": "https://api.github.com/users/adamjonas/received_events",
            "type": "User",
            "site_admin": false
        },
        "body": "",
        "state": "COMMENTED",
        "html_url": "https://github.com/dogsheep/github-to-sqlite/pull/48#pullrequestreview-513118561",
        "pull_request_url": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/48",
        "author_association": "CONTRIBUTOR",
        "_links": {
            "html": {
                "href": "https://github.com/dogsheep/github-to-sqlite/pull/48#pullrequestreview-513118561"
            },
            "pull_request": {
                "href": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/48"
            }
        },
        "submitted_at": "2020-10-20T20:45:05Z",
        "commit_id": "4f33b850bd37829262dd29e1c520afffebedc19c"
    },
    {
        "id": 513127529,
        "node_id": "MDE3OlB1bGxSZXF1ZXN0UmV2aWV3NTEzMTI3NTI5",
        "user": {
            "login": "adamjonas",
            "id": 755825,
            "node_id": "MDQ6VXNlcjc1NTgyNQ==",
            "avatar_url": "https://avatars1.githubusercontent.com/u/755825?v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/adamjonas",
            "html_url": "https://github.com/adamjonas",
            "followers_url": "https://api.github.com/users/adamjonas/followers",
            "following_url": "https://api.github.com/users/adamjonas/following{/other_user}",
            "gists_url": "https://api.github.com/users/adamjonas/gists{/gist_id}",
            "starred_url": "https://api.github.com/users/adamjonas/starred{/owner}{/repo}",
            "subscriptions_url": "https://api.github.com/users/adamjonas/subscriptions",
            "organizations_url": "https://api.github.com/users/adamjonas/orgs",
            "repos_url": "https://api.github.com/users/adamjonas/repos",
            "events_url": "https://api.github.com/users/adamjonas/events{/privacy}",
            "received_events_url": "https://api.github.com/users/adamjonas/received_events",
            "type": "User",
            "site_admin": false
        },
        "body": "",
        "state": "COMMENTED",
        "html_url": "https://github.com/dogsheep/github-to-sqlite/pull/48#pullrequestreview-513127529",
        "pull_request_url": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/48",
        "author_association": "CONTRIBUTOR",
        "_links": {
            "html": {
                "href": "https://github.com/dogsheep/github-to-sqlite/pull/48#pullrequestreview-513127529"
            },
            "pull_request": {
                "href": "https://api.github.com/repos/dogsheep/github-to-sqlite/pulls/48"
            }
        },
        "submitted_at": "2020-10-20T20:57:33Z",
        "commit_id": "3a0d5c498f9faae4e40aab204cd01b965a4f61f3"
    }
]
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Feature: pull request reviews and comments 664485022  
735482187 https://github.com/dogsheep/github-to-sqlite/issues/46#issuecomment-735482187 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/46 MDEyOklzc3VlQ29tbWVudDczNTQ4MjE4Nw== simonw 9599 2020-11-30T00:20:11Z 2020-11-30T00:20:11Z MEMBER

Pull request are now added, thanks to @adamjonas.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Feature: pull request reviews and comments 664485022  
735465708 https://github.com/dogsheep/github-to-sqlite/issues/54#issuecomment-735465708 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/54 MDEyOklzc3VlQ29tbWVudDczNTQ2NTcwOA== simonw 9599 2020-11-29T22:08:46Z 2020-11-29T22:08:46Z MEMBER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
github-to-sqlite workflows command 753026003  
735464493 https://github.com/dogsheep/github-to-sqlite/issues/54#issuecomment-735464493 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/54 MDEyOklzc3VlQ29tbWVudDczNTQ2NDQ5Mw== simonw 9599 2020-11-29T21:57:32Z 2020-11-29T21:57:32Z MEMBER

$ github-to-sqlite workflows github.db simonw/datasette dogsheep/github-to-sqlite

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
github-to-sqlite workflows command 753026003  
735464438 https://github.com/dogsheep/github-to-sqlite/issues/54#issuecomment-735464438 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/54 MDEyOklzc3VlQ29tbWVudDczNTQ2NDQzOA== simonw 9599 2020-11-29T21:57:08Z 2020-11-29T21:57:08Z MEMBER

Inspired by this tweet from Michael Heap https://twitter.com/mheap/status/1333108608817631238

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
github-to-sqlite workflows command 753026003  
727692413 https://github.com/dogsheep/swarm-to-sqlite/issues/11#issuecomment-727692413 https://api.github.com/repos/dogsheep/swarm-to-sqlite/issues/11 MDEyOklzc3VlQ29tbWVudDcyNzY5MjQxMw== simonw 9599 2020-11-16T02:15:22Z 2020-11-16T02:15:22Z MEMBER

Thanks, I'll look into this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Error thrown: sqlite3.OperationalError: table users has no column named lastName 743400216  
712266834 https://github.com/dogsheep/dogsheep-beta/issues/29#issuecomment-712266834 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/29 MDEyOklzc3VlQ29tbWVudDcxMjI2NjgzNA== simonw 9599 2020-10-19T16:01:23Z 2020-10-19T16:01:23Z MEMBER

Might just be a documented pattern for how to configure this in YAML templates.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add search highlighting snippets 724759588  
711569063 https://github.com/dogsheep/github-to-sqlite/issues/50#issuecomment-711569063 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/50 MDEyOklzc3VlQ29tbWVudDcxMTU2OTA2Mw== simonw 9599 2020-10-19T05:01:29Z 2020-10-19T05:01:29Z MEMBER

Demo of --accept:

github-to-sqlite get /repos/simonw/datasette/readme --accept 'application/vnd.github.VERSION.html'
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Commands for making authenticated API calls 703218756  
711089647 https://github.com/dogsheep/dogsheep-beta/issues/28#issuecomment-711089647 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/28 MDEyOklzc3VlQ29tbWVudDcxMTA4OTY0Nw== simonw 9599 2020-10-17T22:43:13Z 2020-10-17T22:43:13Z MEMBER

Since my personal Dogsheep uses Datasette authentication, I'm going to need to pass through cookies. https://github.com/simonw/datasette/issues/1020 will solve that in the future but for now I need to solve it explicitly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Switch to using datasette.client 723861683  
711081703 https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711081703 https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11 MDEyOklzc3VlQ29tbWVudDcxMTA4MTcwMw== simonw 9599 2020-10-17T21:18:35Z 2020-10-17T21:18:35Z MEMBER

OK, if you upgrade to the just-released 1.0 this should work (it worked against my Spanish export).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
export.xml file name varies with different language settings 723838331  
711079760 https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711079760 https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11 MDEyOklzc3VlQ29tbWVudDcxMTA3OTc2MA== simonw 9599 2020-10-17T21:00:05Z 2020-10-17T21:00:05Z MEMBER

Checking for either <!DOCTYPE HealthData or <HealthData in the first 1000 bytes should do it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
export.xml file name varies with different language settings 723838331  
711079056 https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711079056 https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11 MDEyOklzc3VlQ29tbWVudDcxMTA3OTA1Ng== simonw 9599 2020-10-17T20:53:00Z 2020-10-17T20:53:00Z MEMBER

I think the safest thing is to sniff the first few lines of the file. Those should be the same no matter the language that was used:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE HealthData [
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
export.xml file name varies with different language settings 723838331  
711078917 https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711078917 https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11 MDEyOklzc3VlQ29tbWVudDcxMTA3ODkxNw== simonw 9599 2020-10-17T20:51:55Z 2020-10-17T20:52:03Z MEMBER

I switched my phone to Spanish and ran an export - I got a file called exportar.zip. Unzipped I still got a apple_ health_export folder but the root contained:

electrocardiograms/
export_cda.xml
exportar.xml
workout-routes/

It looks like export_cda.xml does not have a translated name, so maybe I can ignore it and look for the other .xml file in that directory.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
export.xml file name varies with different language settings 723838331  
711074306 https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711074306 https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11 MDEyOklzc3VlQ29tbWVudDcxMTA3NDMwNg== simonw 9599 2020-10-17T20:16:22Z 2020-10-17T20:16:22Z MEMBER

The "first XML file in the root" solution is probably easier though!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
export.xml file name varies with different language settings 723838331  
711074031 https://github.com/dogsheep/healthkit-to-sqlite/issues/11#issuecomment-711074031 https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/11 MDEyOklzc3VlQ29tbWVudDcxMTA3NDAzMQ== simonw 9599 2020-10-17T20:14:01Z 2020-10-17T20:14:01Z MEMBER

I'd be happy to teach the tool to look for export.xml or eksport.xml - and then expand that list to other languages.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
export.xml file name varies with different language settings 723838331  
706834800 https://github.com/dogsheep/evernote-to-sqlite/issues/5#issuecomment-706834800 https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDcwNjgzNDgwMA== simonw 9599 2020-10-12T03:24:57Z 2020-10-16T20:16:28Z MEMBER

Here's my first attempt at a plugin for this:

from datasette import hookimpl
import jinja2

START = "<en-note"
END = "</en-note>"
TEMPLATE = """
<div style="max-width: 500px; white-space: normal; overflow-wrap: break-word;">{}</div>
""".strip()

EN_MEDIA_SCRIPT = """
Array.from(document.querySelectorAll('en-media')).forEach(el => {
    let hash = el.getAttribute('hash');
    let type = el.getAttribute('type');
    let path = `/evernote/resources_data/${hash}.json?_shape=array`;
    fetch(path).then(r => r.json()).then(rows => {
        let b64 = rows[0].data.encoded;
        let data = `data:${type};base64,${b64}`;
        el.innerHTML = `<img style="max-width: 300px" src="${data}">`;
    });
});
"""


@hookimpl
def render_cell(value, table):
    if not table:
        # Don't render content from arbitrary SQL queries, could be XSS hole
        return
    if not value or not isinstance(value, str):
        return
    value = value.strip()
    if value.startswith(START) and value.endswith(END):
        trimmed = value[len(START) : -len(END)]
        trimmed = trimmed.split(">", 1)[1]
        # Replace those horrible double newlines
        trimmed = trimmed.replace("<div><br /></div>", "<br>")
        return jinja2.Markup(TEMPLATE.format(trimmed))


@hookimpl
def extra_body_script():
    return EN_MEDIA_SCRIPT

It works!

It does however demonstrate that Evernote's "clip this webpage" feature means there is a LOT of weird HTML that can get into a note. It looks like they've filtered out the scripts but I wouldn't bet on it - they certainly don't filter out many of the inline styles. So running Bleach is almost certainly a good idea.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Figure out how to display images from <en-media> tags inline in Datasette 718938889  
707332912 https://github.com/dogsheep/swarm-to-sqlite/issues/8#issuecomment-707332912 https://api.github.com/repos/dogsheep/swarm-to-sqlite/issues/8 MDEyOklzc3VlQ29tbWVudDcwNzMzMjkxMg== simonw 9599 2020-10-12T20:35:06Z 2020-10-12T20:35:06Z MEMBER

Shipped a fix for this in swarm-to-sqlite 0.3.2.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Error thrown: table photos has no column named hasSticker 648245071  
706786548 https://github.com/dogsheep/evernote-to-sqlite/issues/4#issuecomment-706786548 https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/4 MDEyOklzc3VlQ29tbWVudDcwNjc4NjU0OA== simonw 9599 2020-10-11T23:39:46Z 2020-10-11T23:39:46Z MEMBER

Should have used porter stemming for this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Configure FTS + add an index on the date columns 718938508  
706785201 https://github.com/dogsheep/evernote-to-sqlite/issues/6#issuecomment-706785201 https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/6 MDEyOklzc3VlQ29tbWVudDcwNjc4NTIwMQ== simonw 9599 2020-10-11T23:29:39Z 2020-10-11T23:29:39Z MEMBER

It looks to me like each of those <item> blocks has a number of guesses in order of confidence:

  <item x="215" y="190" w="187" h="39">
    <t w="57">wonders,</t>
    <t w="55">wanders,</t>
    <t w="52">wonders ?</t>
    <t w="45">wonders</t>
    <t w="42">wonders.</t>
  </item>

So maybe the best approach here is to just take the first t element within each item.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better handling of OCR data 718949182  
706785086 https://github.com/dogsheep/evernote-to-sqlite/issues/6#issuecomment-706785086 https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/6 MDEyOklzc3VlQ29tbWVudDcwNjc4NTA4Ng== simonw 9599 2020-10-11T23:28:50Z 2020-10-11T23:28:50Z MEMBER

The XML for the OCR stuff is a bit weird. Currently I'm doing this to it:

https://github.com/dogsheep/evernote-to-sqlite/blob/c33d7b043a45eb3e88676e5fa3ce31755199d9f8/evernote_to_sqlite/utils.py#L70-L78

This can produce some odd results, for example:

Sure 'Sure, 'Sure. Sure, Sure. sure sure. sure ? If you If Yau [you live jive In m 1n an area devoid of natural wonders, wanders, wonders ? wonders wonders. your mind will be blown, blown' blown. blown ? -e i ? ,1 IL it ? at ? KY ? fl ft bat at

Which came from this image:

The XML for that is:

<recoIndex docType="unknown" objType="image" objID="05ffb72b307bf495f064243c7099d94f" engineVersion="6.5.17.7" recoType="service" lang="en" objWidth="1000" objHeight="1504">
  <item x="68" y="75" w="104" h="37">
    <t w="60">Sure</t>
    <t w="52">'Sure,</t>
    <t w="47">'Sure.</t>
    <t w="33">Sure,</t>
    <t w="26">Sure.</t>
  </item>
  <item x="182" y="83" w="92" h="26">
    <t w="62">sure</t>
    <t w="58">sure.</t>
    <t w="46">sure ?</t>
  </item>
  <item x="69" y="132" w="107" h="45">
    <t w="81">If you</t>
    <t w="64">If Yau</t>
    <t w="31">[you</t>
  </item>
  <item x="186" y="132" w="67" h="35">
    <t w="85">live</t>
    <t w="51">jive</t>
  </item>
  <item x="263" y="140" w="36" h="27">
    <t w="82">In</t>
    <t w="56">m</t>
    <t w="53">1n</t>
  </item>
  <item x="309" y="140" w="53" h="27">
    <t w="82">an</t>
  </item>
  <item x="372" y="141" w="90" h="26">
    <t w="94">area</t>
  </item>
  <item x="472" y="132" w="138" h="35">
    <t w="85">devoid</t>
  </item>
  <item x="620" y="132" w="43" h="35">
    <t w="82">of</t>
  </item>
  <item x="68" y="190" w="137" h="35">
    <t w="87">natural</t>
  </item>
  <item x="215" y="190" w="187" h="39">
    <t w="57">wonders,</t>
    <t w="55">wanders,</t>
    <t w="52">wonders ?</t>
    <t w="45">wonders</t>
    <t w="42">wonders.</t>
  </item>
  <item x="410" y="198" w="98" h="36">
    <t w="88">your</t>
  </item>
  <item x="518" y="190" w="102" h="35">
    <t w="86">mind</t>
  </item>
  <item x="630" y="190" w="69" h="34">
    <t w="87">will</t>
  </item>
  <item x="709" y="190" w="55" h="35">
    <t w="82">be</t>
  </item>
  <item x="774" y="190" w="137" h="34">
    <t w="56">blown,</t>
    <t w="55">blown'</t>
    <t w="48">blown.</t>
    <t w="48">blown ?</t>
  </item>
  <item x="166" y="736" w="8" h="6">
    <t w="66">-e</t>
  </item>
  <item x="273" y="966" w="29" h="21">
    <t w="11">i ?</t>
  </item>
  <item x="281" y="1004" w="28" h="11">
    <t w="11">,1</t>
  </item>
  <item x="512" y="1083" w="10" h="7">
    <t w="10">IL</t>
  </item>
  <item x="29" y="1447" w="7" h="23">
    <t w="17">it ?</t>
    <t w="15">at ?</t>
    <t w="13">KY ?</t>
  </item>
  <item x="414" y="841" w="8" h="16">
    <t w="22">fl</t>
    <t w="20">ft</t>
    <t w="20">bat</t>
    <t w="19">at</t>
  </item>
</recoIndex>
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better handling of OCR data 718949182  
706784028 https://github.com/dogsheep/evernote-to-sqlite/issues/4#issuecomment-706784028 https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/4 MDEyOklzc3VlQ29tbWVudDcwNjc4NDAyOA== simonw 9599 2020-10-11T23:20:32Z 2020-10-11T23:20:32Z MEMBER

I haven't done the FTS on OCR yet. I'm going to move that to another ticket because it requires more thought.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Configure FTS + add an index on the date columns 718938508  
706776808 https://github.com/dogsheep/evernote-to-sqlite/issues/5#issuecomment-706776808 https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDcwNjc3NjgwOA== simonw 9599 2020-10-11T22:23:14Z 2020-10-11T22:23:14Z MEMBER

... but it's still important to be able to get to the rendered note directly from the browse notes /evernote/notes page. Maybe use a simple render_cell() hook that just knows how to generate the link to the rendered note page?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Figure out how to display images from <en-media> tags inline in Datasette 718938889  
706776680 https://github.com/dogsheep/evernote-to-sqlite/issues/5#issuecomment-706776680 https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDcwNjc3NjY4MA== simonw 9599 2020-10-11T22:22:16Z 2020-10-11T22:22:16Z MEMBER

Maybe the best way do this is with a custom route, /-/evernote/note-id - that way I can clean the HTML and resolve the other things in the <en-note> structure without using render_cell() and the like. My concern about using render_cell() is that it could lead to weird security problems when combined with ?sql= queries.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Figure out how to display images from <en-media> tags inline in Datasette 718938889  
706776447 https://github.com/dogsheep/evernote-to-sqlite/issues/5#issuecomment-706776447 https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDcwNjc3NjQ0Nw== simonw 9599 2020-10-11T22:20:32Z 2020-10-11T22:20:32Z MEMBER

Or... I could do this client-side. JavaScript that looks for <en-media> tags and fetches the data using fetch() wouldn't be too hard to write.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Figure out how to display images from <en-media> tags inline in Datasette 718938889  
706776242 https://github.com/dogsheep/evernote-to-sqlite/issues/5#issuecomment-706776242 https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDcwNjc3NjI0Mg== simonw 9599 2020-10-11T22:18:30Z 2020-10-11T22:19:48Z MEMBER

Alternatively, rather than relying on datasette-media this could base64-embed the images. evernote-to-sqlite could register itself as a Datasette plugin that knows how to do this.

Maybe rename the column to evernote_content and register a render cell hook that knows how to rewrite those note bodies so that they are visible?

Might need to feed them through Bleach too, just in case any nasty code can get into them.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Figure out how to display images from <en-media> tags inline in Datasette 718938889  
706776180 https://github.com/dogsheep/evernote-to-sqlite/issues/5#issuecomment-706776180 https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/5 MDEyOklzc3VlQ29tbWVudDcwNjc3NjE4MA== simonw 9599 2020-10-11T22:17:55Z 2020-10-11T22:17:55Z MEMBER

We could even do server-side thumbnailing for some of these images, but I'm inclined to serve up the full size ones and set a width on the image element based on the width attribute on <en-media>.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Figure out how to display images from <en-media> tags inline in Datasette 718938889  
706775706 https://github.com/dogsheep/evernote-to-sqlite/issues/1#issuecomment-706775706 https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/1 MDEyOklzc3VlQ29tbWVudDcwNjc3NTcwNg== simonw 9599 2020-10-11T22:14:00Z 2020-10-11T22:14:00Z MEMBER

A live demo would be good too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Documentation on how to use this with Datasette 718934942  
704553385 https://github.com/dogsheep/github-to-sqlite/pull/48#issuecomment-704553385 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/48 MDEyOklzc3VlQ29tbWVudDcwNDU1MzM4NQ== simonw 9599 2020-10-06T21:07:44Z 2020-10-06T21:07:44Z MEMBER

Sorry for not looking at this sooner, trying it out now - pull request looks great!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add pull requests 681228542  
695879531 https://github.com/dogsheep/dogsheep-beta/issues/26#issuecomment-695879531 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/26 MDEyOklzc3VlQ29tbWVudDY5NTg3OTUzMQ== simonw 9599 2020-09-21T02:55:28Z 2020-09-21T02:55:54Z MEMBER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Pagination 705215230  
695879237 https://github.com/dogsheep/dogsheep-beta/issues/26#issuecomment-695879237 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/26 MDEyOklzc3VlQ29tbWVudDY5NTg3OTIzNw== simonw 9599 2020-09-21T02:53:29Z 2020-09-21T02:53:29Z MEMBER

If previous page ended at 2018-02-11T16:32:53+00:00:

select
  search_index.rowid,
  search_index.type,
  search_index.key,
  search_index.title,
  search_index.category,
  search_index.timestamp,
  search_index.search_1
from
  search_index
 where 
  date("timestamp") = '2018-02-11'
 and timestamp < '2018-02-11T16:32:53+00:00'
order by
  search_index.timestamp desc, rowid
limit 41
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Pagination 705215230  
695877627 https://github.com/dogsheep/dogsheep-beta/issues/16#issuecomment-695877627 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/16 MDEyOklzc3VlQ29tbWVudDY5NTg3NzYyNw== simonw 9599 2020-09-21T02:42:29Z 2020-09-21T02:42:29Z MEMBER

Fun twist: assuming timestamp is always stored as UTC, I need the interface to be timezone aware so I can see e.g. everything from 4th July 2020 in the San Francisco timezone definition of 4th July 2020.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Timeline view 694493566  
695875274 https://github.com/dogsheep/dogsheep-beta/issues/26#issuecomment-695875274 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/26 MDEyOklzc3VlQ29tbWVudDY5NTg3NTI3NA== simonw 9599 2020-09-21T02:28:58Z 2020-09-21T02:28:58Z MEMBER

Datasette's implementation is complex because it has to support compound primary keys: https://github.com/simonw/datasette/blob/a258339a935d8d29a95940ef1db01e98bb85ae63/datasette/utils/__init__.py#L88-L114 - but that's not something that's needed for dogsheep-beta.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Pagination 705215230  
695856967 https://github.com/dogsheep/dogsheep-beta/issues/26#issuecomment-695856967 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/26 MDEyOklzc3VlQ29tbWVudDY5NTg1Njk2Nw== simonw 9599 2020-09-21T00:26:59Z 2020-09-21T00:26:59Z MEMBER

It's a shame Datasette doesn't currently have an easy way to implement sorted-by-rank keyset-paginated using a TableView or QueryView. I'll have to do this using the custom SQL query constructed in the plugin: https://github.com/dogsheep/dogsheep-beta/blob/bed9df2b3ef68189e2e445427721a28f4e9b4887/dogsheep_beta/__init__.py#L8-L43

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Pagination 705215230  
695856398 https://github.com/dogsheep/dogsheep-beta/issues/26#issuecomment-695856398 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/26 MDEyOklzc3VlQ29tbWVudDY5NTg1NjM5OA== simonw 9599 2020-09-21T00:22:20Z 2020-09-21T00:22:20Z MEMBER

I'm going to try for keyset pagination sorted by relevance just as a learning exercise.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Pagination 705215230  
695855723 https://github.com/dogsheep/dogsheep-beta/issues/26#issuecomment-695855723 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/26 MDEyOklzc3VlQ29tbWVudDY5NTg1NTcyMw== simonw 9599 2020-09-21T00:16:52Z 2020-09-21T00:17:53Z MEMBER

It feels a bit weird to implement keyset pagination against results sorted by rank because the ranks could change substantially if the search index gets updated while the user is paginating.

I may just ignore that though. If you want reliable pagination you can get it by sorting by date. Maybe it doesn't even make sense to offer pagination if you sort by relevance?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Pagination 705215230  
695855646 https://github.com/dogsheep/dogsheep-beta/issues/26#issuecomment-695855646 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/26 MDEyOklzc3VlQ29tbWVudDY5NTg1NTY0Ng== simonw 9599 2020-09-21T00:16:11Z 2020-09-21T00:16:11Z MEMBER

Should I do this with offset/limit or should I do proper keyset pagination?

I think keyset because then it will work well for the full search interface with no filters or search string.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Pagination 705215230  
695851036 https://github.com/dogsheep/dogsheep-beta/issues/16#issuecomment-695851036 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/16 MDEyOklzc3VlQ29tbWVudDY5NTg1MTAzNg== simonw 9599 2020-09-20T23:34:57Z 2020-09-20T23:34:57Z MEMBER

Really basic starting point is to add facet by date.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Timeline view 694493566  
695124698 https://github.com/dogsheep/dogsheep-beta/issues/15#issuecomment-695124698 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/15 MDEyOklzc3VlQ29tbWVudDY5NTEyNDY5OA== simonw 9599 2020-09-18T23:17:38Z 2020-09-18T23:17:38Z MEMBER

This can be part of the demo instance in #6.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add a bunch of config examples 694136490  
695113871 https://github.com/dogsheep/dogsheep-beta/issues/24#issuecomment-695113871 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/24 MDEyOklzc3VlQ29tbWVudDY5NTExMzg3MQ== simonw 9599 2020-09-18T22:30:17Z 2020-09-18T22:30:17Z MEMBER

I think I know what's going on here:

https://github.com/dogsheep/dogsheep-beta/blob/0f1b951c5131d16f3c8559a8e4d79ed5c559e3cb/dogsheep_beta/__init__.py#L166-L171

This is a logic bug - the compiled variable could be the template from the previous loop!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
the JSON object must be str, bytes or bytearray, not 'Undefined' 703970814  
695109140 https://github.com/dogsheep/dogsheep-beta/issues/25#issuecomment-695109140 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/25 MDEyOklzc3VlQ29tbWVudDY5NTEwOTE0MA== simonw 9599 2020-09-18T22:12:20Z 2020-09-18T22:12:20Z MEMBER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
template_debug mechanism 704685890  
695108895 https://github.com/dogsheep/dogsheep-beta/issues/25#issuecomment-695108895 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/25 MDEyOklzc3VlQ29tbWVudDY5NTEwODg5NQ== simonw 9599 2020-09-18T22:11:32Z 2020-09-18T22:11:32Z MEMBER

I'm going to make this a new plugin configuration setting, template_debug.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
template_debug mechanism 704685890  
694557425 https://github.com/dogsheep/dogsheep-beta/issues/24#issuecomment-694557425 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/24 MDEyOklzc3VlQ29tbWVudDY5NDU1NzQyNQ== simonw 9599 2020-09-17T23:41:01Z 2020-09-17T23:41:01Z MEMBER

I removed all of the json.loads() calls and I'm still getting that Undefined error.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
the JSON object must be str, bytes or bytearray, not 'Undefined' 703970814  
694554584 https://github.com/dogsheep/dogsheep-beta/issues/24#issuecomment-694554584 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/24 MDEyOklzc3VlQ29tbWVudDY5NDU1NDU4NA== simonw 9599 2020-09-17T23:31:25Z 2020-09-17T23:31:25Z MEMBER

I'd prefer it if errors in these template fragments were displayed as errors inline where the fragment should have been inserted, rather than 500ing the whole page - especially since the template fragments are user-provided and could have all kinds of odd errors in them which should be as easy to debug as possible.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
the JSON object must be str, bytes or bytearray, not 'Undefined' 703970814  
694553579 https://github.com/dogsheep/dogsheep-beta/issues/24#issuecomment-694553579 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/24 MDEyOklzc3VlQ29tbWVudDY5NDU1MzU3OQ== simonw 9599 2020-09-17T23:28:37Z 2020-09-17T23:28:37Z MEMBER

More investigation in pdb:

(dogsheep-beta) dogsheep-beta % datasette . --get '/-/beta?q=pycon&sort=oldest' --pdb
> /usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/__init__.py(341)loads()
-> raise TypeError(f'the JSON object must be str, bytes or bytearray, '
(Pdb) list
336             if s.startswith('\ufeff'):
337                 raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",
338                                       s, 0)
339         else:
340             if not isinstance(s, (bytes, bytearray)):
341  ->             raise TypeError(f'the JSON object must be str, bytes or bytearray, '
342                                 f'not {s.__class__.__name__}')
343             s = s.decode(detect_encoding(s), 'surrogatepass')
344     
345         if "encoding" in kw:
346             import warnings
(Pdb) bytes
<class 'bytes'>
(Pdb) locals()['s']
Undefined
(Pdb) type(locals()['s'])
<class 'jinja2.runtime.Undefined'>
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
the JSON object must be str, bytes or bytearray, not 'Undefined' 703970814  
694552681 https://github.com/dogsheep/dogsheep-beta/issues/24#issuecomment-694552681 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/24 MDEyOklzc3VlQ29tbWVudDY5NDU1MjY4MQ== simonw 9599 2020-09-17T23:25:54Z 2020-09-17T23:25:54Z MEMBER

This is the template fragment it's rendering:

            <div style="overflow: hidden;">
              <p>Tweet by <a href="https://twitter.com/{{ display.screen_name }}">@{{ display.screen_name }}</a> ({{ display.user_name }}, {{ "{:,}".format(display.followers_count or 0) }} followers)
                on <a href="https://twitter.com/{{ display.screen_name }}/status/{{ display.tweet_id }}">{{ display.created_at }}</a></p>
              </p>
              <blockquote>{{ display.full_text }}</blockquote>
              {% if display.media_urls and json.loads(display.media_urls) %}
                {% for url in json.loads(display.media_urls) %}
                  <img src="{{ url }}" style="height: 200px;">
                {% endfor %}
              {% endif %}
            </div>
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
the JSON object must be str, bytes or bytearray, not 'Undefined' 703970814  
694552393 https://github.com/dogsheep/dogsheep-beta/issues/24#issuecomment-694552393 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/24 MDEyOklzc3VlQ29tbWVudDY5NDU1MjM5Mw== simonw 9599 2020-09-17T23:25:01Z 2020-09-17T23:25:17Z MEMBER

Ran locals() In the debugger:
{'range': <class 'range'>, 'dict': <class 'dict'>, 'lipsum': <function generate_lorem_ipsum at 0x10aeff430>, 'cycler': <class 'jinja2.utils.Cycler'>, 'joiner': <class 'jinja2.utils.Joiner'>, 'namespace': <class 'jinja2.utils.Namespace'>, 'rank': -9.383801886431414, 'rowid': 14297, 'type': 'twitter.db/tweets', 'key': '312658917933076480', 'title': 'Tweet by @chrisstreeter', 'category': 2, 'timestamp': '2013-03-15T20:17:49+00:00', 'search_1': '@simonw are you at pycon? Would love to meet you.', 'display': {'avatar_url': 'https://pbs.twimg.com/profile_images/806275088597204993/38yLHfJi_normal.jpg', 'user_name': 'Chris Streeter', 'screen_name': 'chrisstreeter', 'followers_count': 280, 'tweet_id': 312658917933076480, 'created_at': '2013-03-15T20:17:49+00:00', 'full_text': '@simonw are you at pycon? Would love to meet you.', 'media_urls_2': '[]', 'media_urls': '[]'}, 'json': <module 'json' from '/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/__init__.py'>}

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
the JSON object must be str, bytes or bytearray, not 'Undefined' 703970814  
694551646 https://github.com/dogsheep/dogsheep-beta/issues/24#issuecomment-694551646 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/24 MDEyOklzc3VlQ29tbWVudDY5NDU1MTY0Ng== simonw 9599 2020-09-17T23:22:48Z 2020-09-17T23:22:48Z MEMBER

Looks like its happening in a Jinja fragment template for one of the results:

  /Users/simon/Dropbox/Development/dogsheep-beta/dogsheep_beta/__init__.py(169)process_results()
-> output = compiled.render({**result, **{"json": json}})
  /Users/simon/.local/share/virtualenvs/dogsheep-beta-u_po4Rpj/lib/python3.8/site-packages/jinja2/asyncsupport.py(71)render()
-> return original_render(self, *args, **kwargs)
  /Users/simon/.local/share/virtualenvs/dogsheep-beta-u_po4Rpj/lib/python3.8/site-packages/jinja2/environment.py(1090)render()
-> self.environment.handle_exception()
  /Users/simon/.local/share/virtualenvs/dogsheep-beta-u_po4Rpj/lib/python3.8/site-packages/jinja2/environment.py(832)handle_exception()
-> reraise(*rewrite_traceback_stack(source=source))
  /Users/simon/.local/share/virtualenvs/dogsheep-beta-u_po4Rpj/lib/python3.8/site-packages/jinja2/_compat.py(28)reraise()
-> raise value.with_traceback(tb)
  <template>(5)top-level template code()
> /usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/__init__.py(341)loads()
-> raise TypeError(f'the JSON object must be str, bytes or bytearray, '
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
the JSON object must be str, bytes or bytearray, not 'Undefined' 703970814  
694551406 https://github.com/dogsheep/dogsheep-beta/issues/24#issuecomment-694551406 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/24 MDEyOklzc3VlQ29tbWVudDY5NDU1MTQwNg== simonw 9599 2020-09-17T23:22:07Z 2020-09-17T23:22:07Z MEMBER

Neat, I can debug this with the new --pdb option:

datasette . --get '/-/beta?q=pycon&sort=oldest' --pdb
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
the JSON object must be str, bytes or bytearray, not 'Undefined' 703970814  
694548909 https://github.com/dogsheep/dogsheep-beta/issues/16#issuecomment-694548909 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/16 MDEyOklzc3VlQ29tbWVudDY5NDU0ODkwOQ== simonw 9599 2020-09-17T23:15:09Z 2020-09-17T23:15:09Z MEMBER

I have sort by date now, #21.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Timeline view 694493566  
693794700 https://github.com/dogsheep/github-to-sqlite/issues/50#issuecomment-693794700 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/50 MDEyOklzc3VlQ29tbWVudDY5Mzc5NDcwMA== simonw 9599 2020-09-17T04:02:39Z 2020-09-17T04:02:39Z MEMBER

It would be useful if you could pass an --accept option to this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Commands for making authenticated API calls 703218756  
693789129 https://github.com/dogsheep/github-to-sqlite/issues/50#issuecomment-693789129 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/50 MDEyOklzc3VlQ29tbWVudDY5Mzc4OTEyOQ== simonw 9599 2020-09-17T03:40:01Z 2020-09-17T03:40:01Z MEMBER

Bug with endpoints that return dictionaries rather than arrays:

github-to-sqlite get /users/simonw
[
    "login",
    "id",
    "node_id",
    "avatar_url",
    "gravatar_id",
    "url",
    "html_url",
    "followers_url",
    "following_url",
    "gists_url",
    "starred_url",
    "subscriptions_url",
    "organizations_url",
    "repos_url",
    "events_url",
    "received_events_url",
    "type",
    "site_admin",
    "name",
    "company",
    "blog",
    "location",
    "email",
    "hireable",
    "bio",
    "twitter_username",
    "public_repos",
    "public_gists",
    "followers",
    "following",
    "created_at",
    "updated_at"
]
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Commands for making authenticated API calls 703218756  
693788387 https://github.com/dogsheep/github-to-sqlite/issues/50#issuecomment-693788387 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/50 MDEyOklzc3VlQ29tbWVudDY5Mzc4ODM4Nw== simonw 9599 2020-09-17T03:36:47Z 2020-09-17T03:36:58Z MEMBER

Fun demo of the --nl option:

github-to-sqlite get /users/simonw/repos --paginate --nl | sqlite-utils insert simonw.db repos - --nl
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Commands for making authenticated API calls 703218756  
693788032 https://github.com/dogsheep/github-to-sqlite/issues/50#issuecomment-693788032 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/50 MDEyOklzc3VlQ29tbWVudDY5Mzc4ODAzMg== simonw 9599 2020-09-17T03:35:22Z 2020-09-17T03:35:22Z MEMBER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Commands for making authenticated API calls 703218756  
693775622 https://github.com/dogsheep/github-to-sqlite/issues/50#issuecomment-693775622 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/50 MDEyOklzc3VlQ29tbWVudDY5Mzc3NTYyMg== simonw 9599 2020-09-17T02:48:34Z 2020-09-17T02:48:34Z MEMBER

I'd like a --paginate option that does the same thing as https://github.com/simonw/paginate-json

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Commands for making authenticated API calls 703218756  
693773191 https://github.com/dogsheep/github-to-sqlite/issues/50#issuecomment-693773191 https://api.github.com/repos/dogsheep/github-to-sqlite/issues/50 MDEyOklzc3VlQ29tbWVudDY5Mzc3MzE5MQ== simonw 9599 2020-09-17T02:39:26Z 2020-09-17T02:39:26Z MEMBER

I'm going to start with github-to-sqlite get and github-to-sqlite post - I may add put and suchlike later on.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Commands for making authenticated API calls 703218756  
689226390 https://github.com/dogsheep/dogsheep-beta/issues/17#issuecomment-689226390 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/17 MDEyOklzc3VlQ29tbWVudDY4OTIyNjM5MA== simonw 9599 2020-09-09T00:36:07Z 2020-09-09T00:36:07Z MEMBER

Alternative names:

  • type
  • record_type
  • doctype

I think type is right. It matches what Elasticsearch used to call their equivalent of this (before they removed the feature!). https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Rename "table" to "type" 694500679  
688626037 https://github.com/dogsheep/dogsheep-beta/issues/19#issuecomment-688626037 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/19 MDEyOklzc3VlQ29tbWVudDY4ODYyNjAzNw== simonw 9599 2020-09-08T05:27:07Z 2020-09-08T05:27:07Z MEMBER

A really clever way to do this would be with triggers. The indexer script would add triggers to each of the database tables that it is indexing - each in their own database.

Those triggers would then maintain a _index_queue_ table. This table would record the primary key of rows that are added, modified or deleted. The indexer could then work by reading through the _index_queue_ table, re-indexing (or deleting) just the primary keys listed there, and then emptying the queue once it has finished.

This would add a small amount of overhead to insert/update/delete queries run against the table. My hunch is that the overhead would be miniscule, but I could still allow people to opt-out for tables that are so high traffic that this would matter.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Figure out incremental re-indexing 695556681  
688625430 https://github.com/dogsheep/dogsheep-beta/issues/19#issuecomment-688625430 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/19 MDEyOklzc3VlQ29tbWVudDY4ODYyNTQzMA== simonw 9599 2020-09-08T05:24:50Z 2020-09-08T05:24:50Z MEMBER

I thought about allowing tables to define a incremental indexing SQL query - maybe something that can return just records touched in the past hour, or records since a recorded "last indexed record" value.

The problem with this is deletes - if you delete a record, how does the indexer know to remove it? See #18 - that's already caused problems.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Figure out incremental re-indexing 695556681  
688623097 https://github.com/dogsheep/dogsheep-beta/issues/18#issuecomment-688623097 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/18 MDEyOklzc3VlQ29tbWVudDY4ODYyMzA5Nw== simonw 9599 2020-09-08T05:15:51Z 2020-09-08T05:15:51Z MEMBER

I'm inclined to go with the first, simpler option. I have longer term plans for efficient incremental index updates based on clever trickery with triggers.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Deleted records stay in the search index 695553522  
688622995 https://github.com/dogsheep/dogsheep-beta/issues/18#issuecomment-688622995 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/18 MDEyOklzc3VlQ29tbWVudDY4ODYyMjk5NQ== simonw 9599 2020-09-08T05:15:21Z 2020-09-08T05:15:21Z MEMBER

Alternatively it could run as it does now but add a DELETE FROM index1.search_index WHERE key not in (select key from ...).

I'm not sure which would be more efficient.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Deleted records stay in the search index 695553522  
687880459 https://github.com/dogsheep/dogsheep-beta/issues/17#issuecomment-687880459 https://api.github.com/repos/dogsheep/dogsheep-beta/issues/17 MDEyOklzc3VlQ29tbWVudDY4Nzg4MDQ1OQ== simonw 9599 2020-09-06T19:36:32Z 2020-09-06T19:36:32Z MEMBER

At some point I may even want to support search types which are indexed from (and inflated from) more than one database file. I'm going to ignore that for the moment though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Rename "table" to "type" 694500679  

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);