html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app
https://github.com/dogsheep/dogsheep-photos/issues/18#issuecomment-624364557,https://api.github.com/repos/dogsheep/dogsheep-photos/issues/18,624364557,MDEyOklzc3VlQ29tbWVudDYyNDM2NDU1Nw==,9599,2020-05-05T23:49:18Z,2020-05-05T23:49:18Z,MEMBER,Label is `macos-latest`,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",612860758,
https://github.com/dogsheep/dogsheep-photos/issues/17#issuecomment-624278714,https://api.github.com/repos/dogsheep/dogsheep-photos/issues/17,624278714,MDEyOklzc3VlQ29tbWVudDYyNDI3ODcxNA==,9599,2020-05-05T20:07:19Z,2020-05-05T20:07:19Z,MEMBER,"From https://hynek.me/articles/conditional-python-dependencies/ I think this will look like:
```python
setup(
    # ...
    install_requires=[
        # ...
        ""osxphotos>=0.28.13 ; sys_platform=='darwin'"",
    ]
)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",612860531,
https://github.com/dogsheep/dogsheep-photos/issues/17#issuecomment-624278090,https://api.github.com/repos/dogsheep/dogsheep-photos/issues/17,624278090,MDEyOklzc3VlQ29tbWVudDYyNDI3ODA5MA==,9599,2020-05-05T20:06:01Z,2020-05-05T20:06:01Z,MEMBER,https://www.python.org/dev/peps/pep-0508/#environment-markers I think I want `sys_platform` of `darwin`.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",612860531,
https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623865250,https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16,623865250,MDEyOklzc3VlQ29tbWVudDYyMzg2NTI1MA==,9599,2020-05-05T05:38:16Z,2020-05-05T05:38:16Z,MEMBER,It looks like `groups.content_string` often has a null byte in it. I should clean this up as part of the import.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",612287234,
https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623863902,https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16,623863902,MDEyOklzc3VlQ29tbWVudDYyMzg2MzkwMg==,9599,2020-05-05T05:31:53Z,2020-05-05T05:31:53Z,MEMBER,"Yes! Turning those `rowid` values into `id` with this script did the job:
```python
import sqlite3
import sqlite_utils

conn = sqlite3.connect(
    ""/Users/simon/Pictures/Photos Library.photoslibrary/database/search/psi.sqlite""
)


def all_rows(table):
    result = conn.execute(""select rowid as id, * from {}"".format(table))
    cols = [c[0] for c in result.description]
    for row in result.fetchall():
        yield dict(zip(cols, row))


if __name__ == ""__main__"":
    db = sqlite_utils.Database(""psi_copy.db"")
    for table in (""assets"", ""collections"", ""ga"", ""gc"", ""groups""):
        db[table].upsert_all(all_rows(table), pk=""id"", alter=True)
```
Then I ran this query:
```sql
select 
  json_object('img_src', 'https://photos.simonwillison.net/i/' || photos.sha256 || '.' || photos.ext || '?w=400') as photo,
   group_concat(strip_null_chars(groups.content_string), ' ') as words, assets.uuid_0, assets.uuid_1, to_uuid(assets.uuid_0, assets.uuid_1) as uuid
from assets join ga on assets.id = ga.assetid
join groups on ga.groupid = groups.id
join photos on photos.uuid = to_uuid(assets.uuid_0, assets.uuid_1)
where groups.category = 2024
group by assets.id
order by random() limit 10
```
And got these results!
<img width=""1054"" alt=""psi_copy__select_json_object__img_src____https___photos_simonwillison_net_i______photos_sha256___________photos_ext______w_400___as_photo__group_concat_strip_null_chars_groups_content_string________as_words__assets_uuid_0__assets_uuid_1__to"" src=""https://user-images.githubusercontent.com/9599/81037264-f1021a80-8e56-11ea-9924-6f9f55a0fb4b.png"">
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",612287234,
https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623857417,https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16,623857417,MDEyOklzc3VlQ29tbWVudDYyMzg1NzQxNw==,9599,2020-05-05T05:01:47Z,2020-05-05T05:01:47Z,MEMBER,"Even that didn't work - it didn't copy across the rowid values. I'm pretty sure that's what's wrong here:
```
sqlite3 /Users/simon/Pictures/Photos\ Library.photoslibrary/database/search/psi.sqlite 'select rowid, uuid_0, uuid_1 from assets limit 10'                                              
1619605|-9205353363298198838|4814875488794983828
1641378|-9205348195631362269|390804289838822030
1634974|-9205331524553603243|-3834026796261633148
1619083|-9205326176986145401|7563404215614709654
22131|-9205315724827218763|8370531509591906734
1645633|-9205247376092758131|-1311540150497601346
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",612287234,
https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623855885,https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16,623855885,MDEyOklzc3VlQ29tbWVudDYyMzg1NTg4NQ==,9599,2020-05-05T04:54:39Z,2020-05-05T04:54:53Z,MEMBER,"Trying this import mechanism instead:
`sqlite3 /Users/simon/Pictures/Photos\ Library.photoslibrary/database/search/psi.sqlite .dump | grep -v 'CREATE INDEX' | grep -v 'CREATE TRIGGER' | grep -v 'CREATE VIRTUAL TABLE' | sqlite3 search.db`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",612287234,
https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623855841,https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16,623855841,MDEyOklzc3VlQ29tbWVudDYyMzg1NTg0MQ==,9599,2020-05-05T04:54:28Z,2020-05-05T04:54:28Z,MEMBER,"Things were not matching up for me correctly:

<img width=""1143"" alt=""search__select_json_object__img_src____https___photos_simonwillison_net_i______photos_sha256___________photos_ext______w_400___as_photo__groups_content_string__assets_uuid_0__assets_uuid_1__to_uuid_assets_uuid_0__assets_uuid_1__as_uuid__pho"" src=""https://user-images.githubusercontent.com/9599/81035923-ca8db080-8e51-11ea-95a7-6ee60bae7502.png"">

I think that's because my import script didn't correctly import the existing `rowid` values.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",612287234,
https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623846880,https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16,623846880,MDEyOklzc3VlQ29tbWVudDYyMzg0Njg4MA==,9599,2020-05-05T04:06:08Z,2020-05-05T04:06:08Z,MEMBER,"This function seems to convert them into UUIDs that match my photos:
```python
def to_uuid(uuid_0, uuid_1):
    b = uuid_0.to_bytes(8, 'little', signed=True) + uuid_1.to_bytes(8, 'little', signed=True)
    return str(uuid.UUID(bytes=b)).upper()
```","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",612287234,
https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623811131,https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16,623811131,MDEyOklzc3VlQ29tbWVudDYyMzgxMTEzMQ==,9599,2020-05-05T03:16:18Z,2020-05-05T03:16:18Z,MEMBER,"Here's how to convert two integers unto a UUID using Java. Not sure if it's the solution I need though (or how to do the same thing in Python):

https://repl.it/repls/EuphoricSomberClasslibrary

<img width=""1494"" alt=""Repl_it_-_EuphoricSomberClasslibrary"" src=""https://user-images.githubusercontent.com/9599/81032267-0d488c00-8e44-11ea-9be7-680eaccd1611.png"">

```java
import java.util.UUID;

class Main {
  public static void main(String[] args) {
    java.util.UUID uuid = new java.util.UUID(
      2544182952487526660L,
      -3640314103732024685L
    );
    System.out.println(
      uuid
    );
  }
}
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",612287234,
https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623807568,https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16,623807568,MDEyOklzc3VlQ29tbWVudDYyMzgwNzU2OA==,9599,2020-05-05T02:56:06Z,2020-05-05T02:56:06Z,MEMBER,"I'm pretty sure this is what I'm after. The `groups` table has what looks like identified labels in the rows with category = 2025:

<img width=""1122"" alt=""words__groups__2_528_rows_where_where_category___2025"" src=""https://user-images.githubusercontent.com/9599/81031361-e0df4080-8e40-11ea-9060-6d850aa52140.png"">

Then there's a `ga` table that maps groups to assets:

<img width=""304"" alt=""words__ga__633_653_rows"" src=""https://user-images.githubusercontent.com/9599/81031387-f48aa700-8e40-11ea-9a3d-da23903be928.png"">

And an `assets` table which looks like it has one row for every one of my photos:

<img width=""645"" alt=""words__assets__40_419_rows"" src=""https://user-images.githubusercontent.com/9599/81031402-04a28680-8e41-11ea-8047-e9199d068563.png"">

One major challenge: these UUIDs are split into two integer numbers, `uuid_0` and `uuid_1` - but the main photos database uses regular UUIDs like this:

![image](https://user-images.githubusercontent.com/9599/81031481-39164280-8e41-11ea-983b-005ced641a18.png)

I need to figure out how to match up these two different UUID representations. I asked on Twitter if anyone has any ideas: https://twitter.com/simonw/status/1257500689019703296","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",612287234,
https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623806687,https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16,623806687,MDEyOklzc3VlQ29tbWVudDYyMzgwNjY4Nw==,9599,2020-05-05T02:51:16Z,2020-05-05T02:51:16Z,MEMBER,"Running datasette against it directly doesn't work:
```
simon@Simons-MacBook-Pro search % datasette psi.sqlite
Serve! files=('psi.sqlite',) (immutables=()) on port 8001
Usage: datasette serve [OPTIONS] [FILES]...

Error: Connection to psi.sqlite failed check: no such tokenizer: PSITokenizer
```
Instead, I created a new SQLite database with a copy of some of the key tables, like this:
```
sqlite-utils rows psi.sqlite groups | sqlite-utils insert /tmp/search.db groups -
sqlite-utils rows psi.sqlite assets | sqlite-utils insert /tmp/search.db assets -
sqlite-utils rows psi.sqlite ga | sqlite-utils insert /tmp/search.db ga -
sqlite-utils rows psi.sqlite collections | sqlite-utils insert /tmp/search.db collections -
sqlite-utils rows psi.sqlite gc | sqlite-utils insert /tmp/search.db gc -
sqlite-utils rows psi.sqlite lookup | sqlite-utils insert /tmp/search.db lookup -
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",612287234,
https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623806533,https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16,623806533,MDEyOklzc3VlQ29tbWVudDYyMzgwNjUzMw==,9599,2020-05-05T02:50:16Z,2020-05-05T02:50:16Z,MEMBER,"I figured there must be a separate database that Photos uses to store the text of the identified labels.

I used ""Open Files and Ports"" in Activity Monitor against the Photos app to try and spot candidates... and found `/Users/simon/Pictures/Photos Library.photoslibrary/database/search/psi.sqlite` - a 53MB SQLite database file.

<img width=""1365"" alt=""Item-0_and_Item-0_and_Item-0_and_Item-0"" src=""https://user-images.githubusercontent.com/9599/81031213-61ea0800-8e40-11ea-8237-cfce4a5128e0.png"">

Here's the schema of that file:
```
$ sqlite3 psi.sqlite .schema
CREATE TABLE word_embedding(word TEXT, extended_word TEXT, score DOUBLE);
CREATE INDEX word_embedding_index ON word_embedding(word);
CREATE VIRTUAL TABLE word_embedding_prefix USING fts5(extended_word)
/* word_embedding_prefix(extended_word) */;
CREATE TABLE IF NOT EXISTS 'word_embedding_prefix_data'(id INTEGER PRIMARY KEY, block BLOB);
CREATE TABLE IF NOT EXISTS 'word_embedding_prefix_idx'(segid, term, pgno, PRIMARY KEY(segid, term)) WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS 'word_embedding_prefix_content'(id INTEGER PRIMARY KEY, c0);
CREATE TABLE IF NOT EXISTS 'word_embedding_prefix_docsize'(id INTEGER PRIMARY KEY, sz BLOB);
CREATE TABLE IF NOT EXISTS 'word_embedding_prefix_config'(k PRIMARY KEY, v) WITHOUT ROWID;
CREATE TABLE groups(category INT2, owning_groupid INT, content_string TEXT, normalized_string TEXT, lookup_identifier TEXT, token_ranges_0 INT8, token_ranges_1 INT8, UNIQUE(category, owning_groupid, content_string, lookup_identifier, token_ranges_0, token_ranges_1));
CREATE TABLE assets(uuid_0 INT, uuid_1 INT, creationDate INT, UNIQUE(uuid_0, uuid_1));
CREATE TABLE ga(groupid INT, assetid INT, PRIMARY KEY(groupid, assetid));
CREATE TABLE collections(uuid_0 INT, uuid_1 INT, startDate INT, endDate INT, title TEXT, subtitle TEXT, keyAssetUUID_0 INT, keyAssetUUID_1 INT, typeAndNumberOfAssets INT32, sortDate DOUBLE, UNIQUE(uuid_0, uuid_1));
CREATE TABLE gc(groupid INT, collectionid INT, PRIMARY KEY(groupid, collectionid));
CREATE VIRTUAL TABLE prefix USING fts5(content='groups', normalized_string, category UNINDEXED, tokenize = 'PSITokenizer');
CREATE TABLE IF NOT EXISTS 'prefix_data'(id INTEGER PRIMARY KEY, block BLOB);
CREATE TABLE IF NOT EXISTS 'prefix_idx'(segid, term, pgno, PRIMARY KEY(segid, term)) WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS 'prefix_docsize'(id INTEGER PRIMARY KEY, sz BLOB);
CREATE TABLE IF NOT EXISTS 'prefix_config'(k PRIMARY KEY, v) WITHOUT ROWID;
CREATE TABLE lookup(identifier TEXT PRIMARY KEY, category INT2);
CREATE TRIGGER trigger_groups_insert AFTER INSERT ON groups BEGIN INSERT INTO prefix(rowid, normalized_string, category) VALUES (new.rowid, new.normalized_string, new.category); END;
CREATE TRIGGER trigger_groups_delete AFTER DELETE ON groups BEGIN INSERT INTO prefix(prefix, rowid, normalized_string, category) VALUES('delete', old.rowid, old.normalized_string, old.category); END;
CREATE INDEX group_pk ON groups(category, content_string, normalized_string, lookup_identifier);
CREATE INDEX asset_pk ON assets(uuid_0, uuid_1);
CREATE INDEX ga_assetid ON ga(assetid, groupid);
CREATE INDEX collection_pk ON collections(uuid_0, uuid_1);
CREATE INDEX gc_collectionid ON gc(collectionid);
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",612287234,
https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623806085,https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16,623806085,MDEyOklzc3VlQ29tbWVudDYyMzgwNjA4NQ==,9599,2020-05-05T02:47:18Z,2020-05-05T02:47:18Z,MEMBER,"In https://github.com/RhetTbull/osxphotos/issues/121#issuecomment-623249263 Rhet Turnbull spotted a table called `ZSCENEIDENTIFIER` which looked like it might have the right data, but the columns in it aren't particularly helpful:
```
Z_PK,Z_ENT,Z_OPT,ZSCENEIDENTIFIER,ZASSETATTRIBUTES,ZCONFIDENCE
8,49,1,731,5,0.11834716796875
9,49,1,684,6,0.0233648251742125
10,49,1,1702,1,0.026153564453125
```
I love the look of those confidence scores, but what do the numbers mean?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",612287234,
https://github.com/dogsheep/dogsheep-photos/issues/16#issuecomment-623805823,https://api.github.com/repos/dogsheep/dogsheep-photos/issues/16,623805823,MDEyOklzc3VlQ29tbWVudDYyMzgwNTgyMw==,9599,2020-05-05T02:45:56Z,2020-05-05T02:45:56Z,MEMBER,I filed an issue with `osxphotos` about this here: https://github.com/RhetTbull/osxphotos/issues/121,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",612287234,