{"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688501064", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688501064, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODUwMTA2NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T20:30:15Z", "updated_at": "2020-09-07T20:30:38Z", "author_association": "OWNER", "body": "The second challenge here is cleaning up all of those junk rows in existing `*_fts_docsize` tables. Doing that just to the demo database from https://github-to-sqlite.dogsheep.net/github.db dropped its size from 22MB to 16MB! Here's the SQL:\r\n```sql\r\nDELETE FROM [licenses_fts_docsize] WHERE id NOT IN (\r\n SELECT rowid FROM [licenses_fts]);\r\n```\r\nI can do that as part of the existing `table.optimize()` method, which optimizes FTS tables.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688499924", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688499924, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ5OTkyNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T20:25:40Z", "updated_at": "2020-09-07T20:25:50Z", "author_association": "OWNER", "body": "https://www.sqlite.org/pragma.html#pragma_recursive_triggers says:\r\n\r\n> Prior to SQLite [version 3.6.18](https://www.sqlite.org/releaselog/3_6_18.html) (2009-09-11), recursive triggers were not supported. The behavior of SQLite was always as if this pragma was set to OFF. Support for recursive triggers was added in version 3.6.18 but was initially turned OFF by default, for compatibility. Recursive triggers may be turned on by default in future versions of SQLite.\r\n\r\nSo I think the fix is to turn on `recursive_triggers` globally by default for `sqlite-utils`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688499650", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688499650, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ5OTY1MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T20:24:35Z", "updated_at": "2020-09-07T20:24:35Z", "author_association": "OWNER", "body": "This replicates the problem:\r\n```\r\n(github-to-sqlite) /tmp % sqlite-utils tables --counts github.db | grep licenses\r\n {\"table\": \"licenses\", \"count\": 7},\r\n {\"table\": \"licenses_fts_data\", \"count\": 35},\r\n {\"table\": \"licenses_fts_idx\", \"count\": 16},\r\n {\"table\": \"licenses_fts_docsize\", \"count\": 9151},\r\n {\"table\": \"licenses_fts_config\", \"count\": 1},\r\n {\"table\": \"licenses_fts\", \"count\": 7},\r\n(github-to-sqlite) /tmp % github-to-sqlite repos github.db dogsheep \r\n(github-to-sqlite) /tmp % sqlite-utils tables --counts github.db | grep licenses\r\n {\"table\": \"licenses\", \"count\": 7},\r\n {\"table\": \"licenses_fts_data\", \"count\": 45},\r\n {\"table\": \"licenses_fts_idx\", \"count\": 26},\r\n {\"table\": \"licenses_fts_docsize\", \"count\": 9161},\r\n {\"table\": \"licenses_fts_config\", \"count\": 1},\r\n {\"table\": \"licenses_fts\", \"count\": 7},\r\n```\r\nNote how the number of rows in `licenses_fts_docsize` goes from 9151 to 9161.\r\n\r\nThe number went up by ten. I used tracing from #151 to show that the following SQL executed ten times:\r\n```\r\nINSERT OR REPLACE INTO [licenses] ([key], [name], [node_id], [spdx_id], [url]) VALUES \r\n (?, ?, ?, ?, ?);\r\n```\r\nThen I tried executing `PRAGMA recursive_triggers=on;` at the start of the script. This fixed the problem - running the script did not increase the number of rows in `licenses_fts_docsize`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688482355", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688482355, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ4MjM1NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T19:22:51Z", "updated_at": "2020-09-07T19:22:51Z", "author_association": "OWNER", "body": "And the SQLite documentation says:\r\n> When the REPLACE conflict resolution strategy deletes rows in order to satisfy a constraint, [delete triggers](https://www.sqlite.org/lang_createtrigger.html) fire if and only if [recursive triggers](https://www.sqlite.org/pragma.html#pragma_recursive_triggers) are enabled.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688482055", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688482055, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ4MjA1NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T19:21:42Z", "updated_at": "2020-09-07T19:21:42Z", "author_association": "OWNER", "body": "Using `replace=True` there executes `INSERT OR REPLACE` - and Dan Kennedy (SQLite maintainer) on the SQLite forums said this:\r\n> Are you using \"REPLACE INTO\", or \"UPDATE OR REPLACE\" on the \"licenses\" table without having first executed \"PRAGMA recursive_triggers = 1\"? The docs note that delete triggers will not be fired in this case, which would explain things. Second paragraph under \"REPLACE\" here:\r\n>\r\n> https://www.sqlite.org/lang_conflict.html", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688481374", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688481374, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ4MTM3NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T19:19:08Z", "updated_at": "2020-09-07T19:19:08Z", "author_association": "OWNER", "body": "reading through the code for `github-to-sqlite repos` - one of the things it does is calls `save_license` for each repo:\r\n\r\nhttps://github.com/dogsheep/github-to-sqlite/blob/39b2234253096bd579feed4e25104698b8ccd2ba/github_to_sqlite/utils.py#L259-L262\r\n\r\n```python\r\ndef save_license(db, license):\r\n if license is None:\r\n return None\r\n return db[\"licenses\"].insert(license, pk=\"key\", replace=True).last_pk\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688480665", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688480665, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ4MDY2NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T19:16:20Z", "updated_at": "2020-09-07T19:16:20Z", "author_association": "OWNER", "body": "Aha! I have managed to replicate the bug:\r\n```\r\n(github-to-sqlite) /tmp % sqlite-utils tables --counts github.db | grep licenses\r\n {\"table\": \"licenses\", \"count\": 7},\r\n {\"table\": \"licenses_fts_data\", \"count\": 35},\r\n {\"table\": \"licenses_fts_idx\", \"count\": 16},\r\n {\"table\": \"licenses_fts_docsize\", \"count\": 9151},\r\n {\"table\": \"licenses_fts_config\", \"count\": 1},\r\n {\"table\": \"licenses_fts\", \"count\": 7},\r\n(github-to-sqlite) /tmp % github-to-sqlite repos github.db dogsheep \r\n(github-to-sqlite) /tmp % sqlite-utils tables --counts github.db | grep licenses\r\n {\"table\": \"licenses\", \"count\": 7},\r\n {\"table\": \"licenses_fts_data\", \"count\": 45},\r\n {\"table\": \"licenses_fts_idx\", \"count\": 26},\r\n {\"table\": \"licenses_fts_docsize\", \"count\": 9161},\r\n {\"table\": \"licenses_fts_config\", \"count\": 1},\r\n {\"table\": \"licenses_fts\", \"count\": 7},\r\n```\r\nNote that the number of records in `licenses_fts_docsize` went from 9151 to 9161.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688464181", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688464181, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ2NDE4MQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T18:19:54Z", "updated_at": "2020-09-07T18:19:54Z", "author_association": "OWNER", "body": "Even though that table doesn't declare an integer primary key it does have a `rowid` column: https://github-to-sqlite.dogsheep.net/github?sql=select+rowid%2C+%5Bkey%5D%2C+name%2C+spdx_id%2C+url%2C+node_id+from+licenses+order+by+%5Bkey%5D+limit+101\r\n\r\n| rowid | key | name | spdx_id | url | node_id |\r\n| --- | --- | --- | --- | --- | --- |\r\n| 9150 | apache-2.0 | Apache License 2.0 | Apache-2.0 | | MDc6TGljZW5zZTI= |\r\n| 112 | bsd-3-clause | BSD 3-Clause \"New\" or \"Revised\" License | BSD-3-Clause | | MDc6TGljZW5zZTU= |\r\n\r\nhttps://www.sqlite.org/rowidtable.html explains has this clue:\r\n\r\n> If the rowid is not aliased by INTEGER PRIMARY KEY then it is not persistent and might change. In particular the VACUUM command will change rowids for tables that do not declare an INTEGER PRIMARY KEY. Therefore, applications should not normally access the rowid directly, but instead use an INTEGER PRIMARY KEY. ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688460865", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688460865, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ2MDg2NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T18:07:14Z", "updated_at": "2020-09-07T18:07:14Z", "author_association": "OWNER", "body": "Another likely culprit: `licenses` has a text primary key, so it's not using `rowid`:\r\n```sql\r\nCREATE TABLE [licenses] (\r\n [key] TEXT PRIMARY KEY,\r\n [name] TEXT,\r\n [spdx_id] TEXT,\r\n [url] TEXT,\r\n [node_id] TEXT\r\n);\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/149#issuecomment-688460729", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/149", "id": 688460729, "node_id": "MDEyOklzc3VlQ29tbWVudDY4ODQ2MDcyOQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-07T18:06:44Z", "updated_at": "2020-09-07T18:06:44Z", "author_association": "OWNER", "body": "First posted on SQLite forum here but I'm pretty sure this is a bug in how `sqlite-utils` created those tables: https://sqlite.org/forum/forumpost/51aada1b45", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 695319258, "label": "FTS table with 7 rows has _fts_docsize table with 9,141 rows"}, "performed_via_github_app": null}