github

This data as json, CSV

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	issue	performed_via_github_app
https://github.com/simonw/sqlite-utils/issues/529#issuecomment-1592110694	https://api.github.com/repos/simonw/sqlite-utils/issues/529	1592110694	IC_kwDOCGYnMM5e5a5m	7908073	2023-06-14T23:11:47Z	2023-06-14T23:12:12Z	CONTRIBUTOR	sorry i was wrong. `sqlite-utils --raw-lines` works correctly ``` sqlite-utils --raw-lines :memory: "SELECT * FROM (VALUES ('test'), ('line2'))" \| cat -A test$ line2$ sqlite-utils --csv --no-headers :memory: "SELECT * FROM (VALUES ('test'), ('line2'))" \| cat -A test$ line2$ ``` I think this was fixed somewhat recently	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1581090327
https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1264218914	https://api.github.com/repos/simonw/sqlite-utils/issues/491	1264218914	IC_kwDOCGYnMM5LWnMi	7908073	2022-10-01T03:18:36Z	2023-06-14T22:14:24Z	CONTRIBUTOR	> some good concrete use-cases in mind I actually found myself wanting something like this the past couple days. The use-case was databases with slightly different schema but same table names. here is a full script: ``` import argparse from pathlib import Path from sqlite_utils import Database def connect(args, conn=None, kwargs) -> Database: db = Database(conn or args.database, kwargs) with db.conn: db.conn.execute("PRAGMA main.cache_size = 8000") return db def parse_args() -> argparse.Namespace: parser = argparse.ArgumentParser() parser.add_argument("database") parser.add_argument("dbs_folder") parser.add_argument("--db", "-db", help=argparse.SUPPRESS) parser.add_argument("--verbose", "-v", action="count", default=0) args = parser.parse_args() if args.db: args.database = args.db Path(args.database).touch() args.db = connect(args) return args def merge_db(args, source_db): source_db = str(Path(source_db).resolve()) s_db = connect(argparse.Namespace(database=source_db, verbose = args.verbose)) for table in s_db.table_names(): data = s_db[table].rows args.db[table].insert_all(data, alter=True, replace=True) args.db.conn.commit() def merge_directory(): args = parse_args() source_dbs = list(Path(args.dbs_folder).glob('*.db')) for s_db in source_dbs: merge_db(args, s_db) if __name__ == '__main__': merge_directory() ``` edit: I've made some improvements to this and put it on PyPI: ``` $ pip install xklb $ lb merge-db -h usage: library merge-dbs DEST_DB SOURCE_DB ... [--only-target-columns] [--only-new-rows] [--upsert] [--pk PK ...] [--table TABLE ...] Merge-DBs will insert new rows from source dbs to target db, table by table. If primary key(s) are provided, and there is an existing row with the same PK, the default action is to delete the existing row and insert the new row replacing all exist…	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1383646615
https://github.com/simonw/sqlite-utils/issues/535#issuecomment-1592052320	https://api.github.com/repos/simonw/sqlite-utils/issues/535	1592052320	IC_kwDOCGYnMM5e5Mpg	7908073	2023-06-14T22:05:28Z	2023-06-14T22:05:28Z	CONTRIBUTOR	piping to `jq` is good enough usually	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1655860104
https://github.com/simonw/sqlite-utils/issues/555#issuecomment-1592047502	https://api.github.com/repos/simonw/sqlite-utils/issues/555	1592047502	IC_kwDOCGYnMM5e5LeO	7908073	2023-06-14T22:00:10Z	2023-06-14T22:01:57Z	CONTRIBUTOR	You may want to try doing a performance comparison between this and just selecting all the ids with few constraints and then doing the filtering within python. That might seem like a lazy-programmer, inefficient way but queries with large resultsets are a different profile than what databases like SQLITE are designed for. That is not to say that SQLITE is slow or that python is always faster but when you start reading >20% of an index there is an equilibrium that is reached. Especially when adding in writing extra temp tables and stuff to memory/disk. And especially given the `NOT IN` style of query... You may also try chunking like this: ```py def chunks(lst, n) -> Generator: for i in range(0, len(lst), n): yield lst[i : i + n] SQLITE_PARAM_LIMIT = 32765 data = [] chunked = chunks(video_ids, consts.SQLITE_PARAM_LIMIT) for ids in chunked: data.expand( list( db.query( f"""SELECT * from videos WHERE id in (""" + ",".join(["?"] * len(ids)) + ")", (*ids,), ) ) ) ``` but that actually won't work with your `NOT IN` requirements. You need to query the full resultset to check any row. Since you are doing stuff with files/videos in SQLITE you might be interested in my side project: https://github.com/chapmanjacobd/library	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1733198948
https://github.com/simonw/sqlite-utils/issues/557#issuecomment-1590531892	https://api.github.com/repos/simonw/sqlite-utils/issues/557	1590531892	IC_kwDOCGYnMM5ezZc0	7908073	2023-06-14T06:09:21Z	2023-06-14T06:09:21Z	CONTRIBUTOR	I put together a [simple script](https://github.com/chapmanjacobd/library/blob/42129c5ebe15f9d74653c0f5ca4ed0c991d383e0/xklb/scripts/dedupe_db.py) to upsert and remove duplicate rows based on business keys. If anyone has similar problems with above this might help ``` CREATE TABLE my_table ( id INTEGER PRIMARY KEY, column1 TEXT, column2 TEXT, column3 TEXT ); INSERT INTO my_table (column1, column2, column3) VALUES ('Value 1', 'Duplicate 1', 'Duplicate A'), ('Value 2', 'Duplicate 2', 'Duplicate B'), ('Value 3', 'Duplicate 2', 'Duplicate C'), ('Value 4', 'Duplicate 3', 'Duplicate D'), ('Value 5', 'Duplicate 3', 'Duplicate E'), ('Value 6', 'Duplicate 3', 'Duplicate F'); ``` ``` library dedupe-db test.db my_table --bk column2 ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	1740150327