id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,pull_request,body,repo,type,active_lock_reason,performed_via_github_app,reactions,draft,state_reason
1199158210,I_kwDOCGYnMM5HebPC,423,.extract() doesn't set foreign key when extracted columns contain NULL value,37447552,closed,0,,,1,2022-04-10T20:05:30Z,2022-08-27T14:45:04Z,2022-08-27T14:45:04Z,NONE,,"I've run into an issue with `extract` and I don't believe this is the intended behaviour.
I'm working with a database with music listening information. Currently it has one large table `listens` that contains all information. I'm trying to normalize the database by extracting relevant columns to separate tables (`artists`, `tracks`, `albums`). Not every track has an album.
A simplified demonstration with just `track_title` and `album_title` columns:
```ipython
In [1]: import sqlite_utils
In [2]: db = sqlite_utils.Database(memory=True)
In [3]: db[""listens""].insert_all([
...: {""id"": 1, ""track_title"": ""foo"", ""album_title"": ""bar""},
...: {""id"": 2, ""track_title"": ""baz"", ""album_title"": None}
...: ], pk=""id"")
Out[3]:
```
The track in the first row has an album, the second track doesn't. Now I extract album information into a separate column:
```ipython
In [4]: db[""listens""].extract(columns=[""album_title""], table=""albums"", fk_column=""album_id"")
Out[4]:
In [5]: list(db[""albums""].rows)
Out[5]: [{'id': 1, 'album_title': 'bar'}, {'id': 2, 'album_title': None}]
In [6]: list(db[""listens""].rows)
Out[6]:
[{'id': 1, 'track_title': 'foo', 'album_id': 1},
{'id': 2, 'track_title': 'baz', 'album_id': None}]
```
This behaves as expected -- the `album` table contains entries for both the existing album and the NULL album. The `listens` table has a foreign key only for the first row (since the album in the second row was empty).
Now I want to extract the track information as well. Album information belongs to the track so I want to extract both columns to a new table.
```ipython
In [7]: db[""listens""].extract(columns=[""track_title"", ""album_id""], table=""tracks"", fk_column=""track_id"")
Out[7]:
In [8]: list(db[""tracks""].rows)
Out[8]:
[{'id': 1, 'track_title': 'foo', 'album_id': 1},
{'id': 2, 'track_title': 'baz', 'album_id': None}]
In [9]: list(db[""listens""].rows)
Out[9]: [{'id': 1, 'track_id': 1}, {'id': 2, 'track_id': None}]
```
Extracting to the `tracks` table worked fine (both tracks are present with correct columns). However, the `listens` table only has a foreign key to the newly created tracks for the first row, the foreign key in the second row is NULL.
Changing the order of extracts doesn't help.
I poked around in the source a bit and I believe [this line](https://github.com/simonw/sqlite-utils/blob/433813612ff9b4b501739fd7543bef0040dd51fe/sqlite_utils/db.py#L1737) (essentially comparing `NULL = NULL`) is the problem, but I don't know enough about SQL to create a reliable fix myself.",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/423/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed