issue_comments
5 rows where "updated_at" is on date 2023-06-14 and user = 7908073 sorted by id
This data as json, CSV (advanced)
Suggested facets: created_at (date)
id ▼ | html_url | issue_url | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
1264218914 | https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1264218914 | https://api.github.com/repos/simonw/sqlite-utils/issues/491 | IC_kwDOCGYnMM5LWnMi | chapmanjacobd 7908073 | 2022-10-01T03:18:36Z | 2023-06-14T22:14:24Z | CONTRIBUTOR |
I actually found myself wanting something like this the past couple days. The use-case was databases with slightly different schema but same table names. here is a full script: ``` import argparse from pathlib import Path from sqlite_utils import Database def connect(args, conn=None, kwargs) -> Database: db = Database(conn or args.database, kwargs) with db.conn: db.conn.execute("PRAGMA main.cache_size = 8000") return db def parse_args() -> argparse.Namespace: parser = argparse.ArgumentParser() parser.add_argument("database") parser.add_argument("dbs_folder") parser.add_argument("--db", "-db", help=argparse.SUPPRESS) parser.add_argument("--verbose", "-v", action="count", default=0) args = parser.parse_args()
def merge_db(args, source_db): source_db = str(Path(source_db).resolve())
def merge_directory(): args = parse_args() source_dbs = list(Path(args.dbs_folder).glob('*.db')) for s_db in source_dbs: merge_db(args, s_db) if name == 'main': merge_directory() ``` edit: I've made some improvements to this and put it on PyPI: ``` $ pip install xklb $ lb merge-db -h usage: library merge-dbs DEST_DB SOURCE_DB ... [--only-target-columns] [--only-new-rows] [--upsert] [--pk PK ...] [--table TABLE ...]
positional arguments: database source_dbs ``` Also if you want to dedupe a table based on a "business key" which isn't explicitly your primary key(s) you can run this: ``` $ lb dedupe-db -h usage: library dedupe-dbs DATABASE TABLE --bk BUSINESS_KEYS [--pk PRIMARY_KEYS] [--only-columns COLUMNS]
positional arguments: database table options: -h, --help show this help message and exit --skip-0 --only-columns ONLY_COLUMNS Comma separated column names to upsert --primary-keys PRIMARY_KEYS, --pk PRIMARY_KEYS Comma separated primary keys --business-keys BUSINESS_KEYS, --bk BUSINESS_KEYS Comma separated business keys ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Ability to merge databases and tables 1383646615 | |
1590531892 | https://github.com/simonw/sqlite-utils/issues/557#issuecomment-1590531892 | https://api.github.com/repos/simonw/sqlite-utils/issues/557 | IC_kwDOCGYnMM5ezZc0 | chapmanjacobd 7908073 | 2023-06-14T06:09:21Z | 2023-06-14T06:09:21Z | CONTRIBUTOR | I put together a simple script to upsert and remove duplicate rows based on business keys. If anyone has similar problems with above this might help ``` CREATE TABLE my_table ( id INTEGER PRIMARY KEY, column1 TEXT, column2 TEXT, column3 TEXT ); INSERT INTO my_table (column1, column2, column3) VALUES ('Value 1', 'Duplicate 1', 'Duplicate A'), ('Value 2', 'Duplicate 2', 'Duplicate B'), ('Value 3', 'Duplicate 2', 'Duplicate C'), ('Value 4', 'Duplicate 3', 'Duplicate D'), ('Value 5', 'Duplicate 3', 'Duplicate E'), ('Value 6', 'Duplicate 3', 'Duplicate F'); ```
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Aliased ROWID option for tables created from alter=True commands 1740150327 | |
1592047502 | https://github.com/simonw/sqlite-utils/issues/555#issuecomment-1592047502 | https://api.github.com/repos/simonw/sqlite-utils/issues/555 | IC_kwDOCGYnMM5e5LeO | chapmanjacobd 7908073 | 2023-06-14T22:00:10Z | 2023-06-14T22:01:57Z | CONTRIBUTOR | You may want to try doing a performance comparison between this and just selecting all the ids with few constraints and then doing the filtering within python. That might seem like a lazy-programmer, inefficient way but queries with large resultsets are a different profile than what databases like SQLITE are designed for. That is not to say that SQLITE is slow or that python is always faster but when you start reading >20% of an index there is an equilibrium that is reached. Especially when adding in writing extra temp tables and stuff to memory/disk. And especially given the You may also try chunking like this: ```py def chunks(lst, n) -> Generator: for i in range(0, len(lst), n): yield lst[i : i + n] SQLITE_PARAM_LIMIT = 32765 data = [] chunked = chunks(video_ids, consts.SQLITE_PARAM_LIMIT) for ids in chunked: data.expand( list( db.query( f"""SELECT * from videos WHERE id in (""" + ",".join(["?"] * len(ids)) + ")", (*ids,), ) ) ) ``` but that actually won't work with your Since you are doing stuff with files/videos in SQLITE you might be interested in my side project: https://github.com/chapmanjacobd/library |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Filter table by a large bunch of ids 1733198948 | |
1592052320 | https://github.com/simonw/sqlite-utils/issues/535#issuecomment-1592052320 | https://api.github.com/repos/simonw/sqlite-utils/issues/535 | IC_kwDOCGYnMM5e5Mpg | chapmanjacobd 7908073 | 2023-06-14T22:05:28Z | 2023-06-14T22:05:28Z | CONTRIBUTOR | piping to |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
rows: --transpose or psql extended view-like functionality 1655860104 | |
1592110694 | https://github.com/simonw/sqlite-utils/issues/529#issuecomment-1592110694 | https://api.github.com/repos/simonw/sqlite-utils/issues/529 | IC_kwDOCGYnMM5e5a5m | chapmanjacobd 7908073 | 2023-06-14T23:11:47Z | 2023-06-14T23:12:12Z | CONTRIBUTOR | sorry i was wrong. ``` sqlite-utils --raw-lines :memory: "SELECT * FROM (VALUES ('test'), ('line2'))" | cat -A test$ line2$ sqlite-utils --csv --no-headers :memory: "SELECT * FROM (VALUES ('test'), ('line2'))" | cat -A test$ line2$ ``` I think this was fixed somewhat recently |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Microsoft line endings 1581090327 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [issue] INTEGER REFERENCES [issues]([id]) , [performed_via_github_app] TEXT); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
issue 5