{"html_url": "https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1264218914", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/491", "id": 1264218914, "node_id": "IC_kwDOCGYnMM5LWnMi", "user": {"value": 7908073, "label": "chapmanjacobd"}, "created_at": "2022-10-01T03:18:36Z", "updated_at": "2023-06-14T22:14:24Z", "author_association": "CONTRIBUTOR", "body": "> some good concrete use-cases in mind\r\n\r\nI actually found myself wanting something like this the past couple days. The use-case was databases with slightly different schema but same table names.\r\n\r\nhere is a full script:\r\n\r\n```\r\nimport argparse\r\nfrom pathlib import Path\r\n\r\nfrom sqlite_utils import Database\r\n\r\n\r\ndef connect(args, conn=None, **kwargs) -> Database:\r\n db = Database(conn or args.database, **kwargs)\r\n with db.conn:\r\n db.conn.execute(\"PRAGMA main.cache_size = 8000\")\r\n return db\r\n\r\n\r\ndef parse_args() -> argparse.Namespace:\r\n parser = argparse.ArgumentParser()\r\n parser.add_argument(\"database\")\r\n parser.add_argument(\"dbs_folder\")\r\n parser.add_argument(\"--db\", \"-db\", help=argparse.SUPPRESS)\r\n parser.add_argument(\"--verbose\", \"-v\", action=\"count\", default=0)\r\n args = parser.parse_args()\r\n\r\n if args.db:\r\n args.database = args.db\r\n Path(args.database).touch()\r\n args.db = connect(args)\r\n\r\n return args\r\n\r\n\r\ndef merge_db(args, source_db):\r\n source_db = str(Path(source_db).resolve())\r\n\r\n s_db = connect(argparse.Namespace(database=source_db, verbose = args.verbose))\r\n for table in s_db.table_names():\r\n data = s_db[table].rows\r\n args.db[table].insert_all(data, alter=True, replace=True)\r\n\r\n args.db.conn.commit()\r\n\r\n\r\ndef merge_directory():\r\n args = parse_args()\r\n source_dbs = list(Path(args.dbs_folder).glob('*.db'))\r\n for s_db in source_dbs:\r\n merge_db(args, s_db)\r\n\r\n\r\nif __name__ == '__main__':\r\n merge_directory()\r\n```\r\n\r\nedit: I've made some improvements to this and put it on PyPI:\r\n\r\n```\r\n$ pip install xklb\r\n$ lb merge-db -h\r\nusage: library merge-dbs DEST_DB SOURCE_DB ... [--only-target-columns] [--only-new-rows] [--upsert] [--pk PK ...] [--table TABLE ...]\r\n\r\n Merge-DBs will insert new rows from source dbs to target db, table by table. If primary key(s) are provided,\r\n and there is an existing row with the same PK, the default action is to delete the existing row and insert the new row\r\n replacing all existing fields.\r\n\r\n Upsert mode will update matching PK rows such that if a source row has a NULL field and\r\n the destination row has a value then the value will be preserved instead of changed to the source row's NULL value.\r\n\r\n Ignore mode (--only-new-rows) will insert only rows which don't already exist in the destination db\r\n\r\n Test first by using temp databases as the destination db.\r\n Try out different modes / flags until you are satisfied with the behavior of the program\r\n\r\n library merge-dbs --pk path (mktemp --suffix .db) tv.db movies.db\r\n\r\n Merge database data and tables\r\n\r\n library merge-dbs --upsert --pk path video.db tv.db movies.db\r\n library merge-dbs --only-target-columns --only-new-rows --table media,playlists --pk path audio-fts.db audio.db\r\n\r\n library merge-dbs --pk id --only-tables subreddits reddit/81_New_Music.db audio.db\r\n library merge-dbs --only-new-rows --pk subreddit,path --only-tables reddit_posts reddit/81_New_Music.db audio.db -v\r\n\r\npositional arguments:\r\n database\r\n source_dbs\r\n```\r\n\r\nAlso if you want to dedupe a table based on a \"business key\" which isn't explicitly your primary key(s) you can run this:\r\n\r\n```\r\n$ lb dedupe-db -h\r\nusage: library dedupe-dbs DATABASE TABLE --bk BUSINESS_KEYS [--pk PRIMARY_KEYS] [--only-columns COLUMNS]\r\n\r\n Dedupe your database (not to be confused with the dedupe subcommand)\r\n\r\n It should not need to be said but *backup* your database before trying this tool!\r\n\r\n Dedupe-DB will help remove duplicate rows based on non-primary-key business keys\r\n\r\n library dedupe-db ./video.db media --bk path\r\n\r\n If --primary-keys is not provided table metadata primary keys will be used\r\n If --only-columns is not provided all non-primary and non-business key columns will be upserted\r\n\r\npositional arguments:\r\n database\r\n table\r\n\r\noptions:\r\n -h, --help show this help message and exit\r\n --skip-0\r\n --only-columns ONLY_COLUMNS\r\n Comma separated column names to upsert\r\n --primary-keys PRIMARY_KEYS, --pk PRIMARY_KEYS\r\n Comma separated primary keys\r\n --business-keys BUSINESS_KEYS, --bk BUSINESS_KEYS\r\n Comma separated business keys\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1383646615, "label": "Ability to merge databases and tables"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258712931", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/491", "id": 1258712931, "node_id": "IC_kwDOCGYnMM5LBm9j", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-09-26T22:31:58Z", "updated_at": "2022-09-26T22:31:58Z", "author_association": "CONTRIBUTOR", "body": "Right. The backup command will copy tables completely, but in the case of conflicting table names, the destination gets overwritten silently. That might not be what you want here. ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1383646615, "label": "Ability to merge databases and tables"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258697384", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/491", "id": 1258697384, "node_id": "IC_kwDOCGYnMM5LBjKo", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-26T22:12:45Z", "updated_at": "2022-09-26T22:12:45Z", "author_association": "OWNER", "body": "That feels like a slightly different command to me - maybe `sqlite-utils backup data.db data-backup.db`? It doesn't have any of the mechanics for merging tables together. Could be a useful feature separately though.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1383646615, "label": "Ability to merge databases and tables"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258508215", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/491", "id": 1258508215, "node_id": "IC_kwDOCGYnMM5LA0-3", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-09-26T19:22:14Z", "updated_at": "2022-09-26T19:22:14Z", "author_association": "CONTRIBUTOR", "body": "This might be fairly straightforward using SQLite's backup utility: https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.backup\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1383646615, "label": "Ability to merge databases and tables"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258450447", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/491", "id": 1258450447, "node_id": "IC_kwDOCGYnMM5LAm4P", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-26T18:36:23Z", "updated_at": "2022-09-26T18:36:23Z", "author_association": "OWNER", "body": "This is also the kind of feature that would need to express itself in both the Python library and the CLI utility.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1383646615, "label": "Ability to merge databases and tables"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258449887", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/491", "id": 1258449887, "node_id": "IC_kwDOCGYnMM5LAmvf", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-09-26T18:35:50Z", "updated_at": "2022-09-26T18:35:50Z", "author_association": "OWNER", "body": "This is a really interesting idea.\r\n\r\nI'm nervous about needing to set the rules for how duplicate tables should be merged though. This feels like a complex topic - one where there isn't necessarily an obviously \"correct\" way of doing it, but where different problems that people are solving might need different merging approaches.\r\n\r\nLikewise, merging isn't just a database-to-database thing at that point - I could see a need for merging two tables using similar rules to those used for merging two databases.\r\n\r\nSo I think I'd want to have some good concrete use-cases in mind before trying to design how something like this should work. Will leave this thread open for people to drop those in!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1383646615, "label": "Ability to merge databases and tables"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1256858763", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/491", "id": 1256858763, "node_id": "IC_kwDOCGYnMM5K6iSL", "user": {"value": 7908073, "label": "chapmanjacobd"}, "created_at": "2022-09-24T04:50:59Z", "updated_at": "2022-09-24T04:52:08Z", "author_association": "CONTRIBUTOR", "body": "Instead of outputting binary data to stdout the interface might be better like this\r\n\r\n```\r\nsqlite-utils merge animals.db cats.db dogs.db\r\n```\r\n\r\nsimilar to `zip`, `ogr2ogr`, etc\r\n\r\nActually I think this might already be possible within `ogr2ogr`. I don't believe spatial data is a requirement though it might add an `ogc_id` column or something\r\n\r\n```\r\ncp cats.db animals.db\r\nogr2ogr -append animals.db dogs.db\r\nogr2ogr -append animals.db another.db\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1383646615, "label": "Ability to merge databases and tables"}, "performed_via_github_app": null}