home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where author_association = "CONTRIBUTOR" and "updated_at" is on date 2023-06-14 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

issue 5

  • Ability to merge databases and tables 1
  • Microsoft line endings 1
  • rows: --transpose or psql extended view-like functionality 1
  • Filter table by a large bunch of ids 1
  • Aliased ROWID option for tables created from alter=True commands 1

user 1

  • chapmanjacobd 5

author_association 1

  • CONTRIBUTOR · 5 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1592110694 https://github.com/simonw/sqlite-utils/issues/529#issuecomment-1592110694 https://api.github.com/repos/simonw/sqlite-utils/issues/529 IC_kwDOCGYnMM5e5a5m chapmanjacobd 7908073 2023-06-14T23:11:47Z 2023-06-14T23:12:12Z CONTRIBUTOR

sorry i was wrong. sqlite-utils --raw-lines works correctly

``` sqlite-utils --raw-lines :memory: "SELECT * FROM (VALUES ('test'), ('line2'))" | cat -A test$ line2$

sqlite-utils --csv --no-headers :memory: "SELECT * FROM (VALUES ('test'), ('line2'))" | cat -A test$ line2$ ```

I think this was fixed somewhat recently

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Microsoft line endings 1581090327  
1264218914 https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1264218914 https://api.github.com/repos/simonw/sqlite-utils/issues/491 IC_kwDOCGYnMM5LWnMi chapmanjacobd 7908073 2022-10-01T03:18:36Z 2023-06-14T22:14:24Z CONTRIBUTOR

some good concrete use-cases in mind

I actually found myself wanting something like this the past couple days. The use-case was databases with slightly different schema but same table names.

here is a full script:

``` import argparse from pathlib import Path

from sqlite_utils import Database

def connect(args, conn=None, kwargs) -> Database: db = Database(conn or args.database, kwargs) with db.conn: db.conn.execute("PRAGMA main.cache_size = 8000") return db

def parse_args() -> argparse.Namespace: parser = argparse.ArgumentParser() parser.add_argument("database") parser.add_argument("dbs_folder") parser.add_argument("--db", "-db", help=argparse.SUPPRESS) parser.add_argument("--verbose", "-v", action="count", default=0) args = parser.parse_args()

if args.db:
    args.database = args.db
Path(args.database).touch()
args.db = connect(args)

return args

def merge_db(args, source_db): source_db = str(Path(source_db).resolve())

s_db = connect(argparse.Namespace(database=source_db, verbose = args.verbose))
for table in s_db.table_names():
    data = s_db[table].rows
    args.db[table].insert_all(data, alter=True, replace=True)

args.db.conn.commit()

def merge_directory(): args = parse_args() source_dbs = list(Path(args.dbs_folder).glob('*.db')) for s_db in source_dbs: merge_db(args, s_db)

if name == 'main': merge_directory() ```

edit: I've made some improvements to this and put it on PyPI:

``` $ pip install xklb $ lb merge-db -h usage: library merge-dbs DEST_DB SOURCE_DB ... [--only-target-columns] [--only-new-rows] [--upsert] [--pk PK ...] [--table TABLE ...]

Merge-DBs will insert new rows from source dbs to target db, table by table. If primary key(s) are provided,
and there is an existing row with the same PK, the default action is to delete the existing row and insert the new row
replacing all existing fields.

Upsert mode will update matching PK rows such that if a source row has a NULL field and
the destination row has a value then the value will be preserved instead of changed to the source row's NULL value.

Ignore mode (--only-new-rows) will insert only rows which don't already exist in the destination db

Test first by using temp databases as the destination db.
Try out different modes / flags until you are satisfied with the behavior of the program

    library merge-dbs --pk path (mktemp --suffix .db) tv.db movies.db

Merge database data and tables

    library merge-dbs --upsert --pk path video.db tv.db movies.db
    library merge-dbs --only-target-columns --only-new-rows --table media,playlists --pk path audio-fts.db audio.db

    library merge-dbs --pk id --only-tables subreddits reddit/81_New_Music.db audio.db
    library merge-dbs --only-new-rows --pk subreddit,path --only-tables reddit_posts reddit/81_New_Music.db audio.db -v

positional arguments: database source_dbs ```

Also if you want to dedupe a table based on a "business key" which isn't explicitly your primary key(s) you can run this:

``` $ lb dedupe-db -h usage: library dedupe-dbs DATABASE TABLE --bk BUSINESS_KEYS [--pk PRIMARY_KEYS] [--only-columns COLUMNS]

Dedupe your database (not to be confused with the dedupe subcommand)

It should not need to be said but *backup* your database before trying this tool!

Dedupe-DB will help remove duplicate rows based on non-primary-key business keys

    library dedupe-db ./video.db media --bk path

If --primary-keys is not provided table metadata primary keys will be used
If --only-columns is not provided all non-primary and non-business key columns will be upserted

positional arguments: database table

options: -h, --help show this help message and exit --skip-0 --only-columns ONLY_COLUMNS Comma separated column names to upsert --primary-keys PRIMARY_KEYS, --pk PRIMARY_KEYS Comma separated primary keys --business-keys BUSINESS_KEYS, --bk BUSINESS_KEYS Comma separated business keys ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ability to merge databases and tables 1383646615  
1592052320 https://github.com/simonw/sqlite-utils/issues/535#issuecomment-1592052320 https://api.github.com/repos/simonw/sqlite-utils/issues/535 IC_kwDOCGYnMM5e5Mpg chapmanjacobd 7908073 2023-06-14T22:05:28Z 2023-06-14T22:05:28Z CONTRIBUTOR

piping to jq is good enough usually

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
rows: --transpose or psql extended view-like functionality 1655860104  
1592047502 https://github.com/simonw/sqlite-utils/issues/555#issuecomment-1592047502 https://api.github.com/repos/simonw/sqlite-utils/issues/555 IC_kwDOCGYnMM5e5LeO chapmanjacobd 7908073 2023-06-14T22:00:10Z 2023-06-14T22:01:57Z CONTRIBUTOR

You may want to try doing a performance comparison between this and just selecting all the ids with few constraints and then doing the filtering within python.

That might seem like a lazy-programmer, inefficient way but queries with large resultsets are a different profile than what databases like SQLITE are designed for. That is not to say that SQLITE is slow or that python is always faster but when you start reading >20% of an index there is an equilibrium that is reached. Especially when adding in writing extra temp tables and stuff to memory/disk. And especially given the NOT IN style of query...

You may also try chunking like this:

```py def chunks(lst, n) -> Generator: for i in range(0, len(lst), n): yield lst[i : i + n]

SQLITE_PARAM_LIMIT = 32765

data = [] chunked = chunks(video_ids, consts.SQLITE_PARAM_LIMIT) for ids in chunked: data.expand( list( db.query( f"""SELECT * from videos WHERE id in (""" + ",".join(["?"] * len(ids)) + ")", (*ids,), ) ) ) ```

but that actually won't work with your NOT IN requirements. You need to query the full resultset to check any row.

Since you are doing stuff with files/videos in SQLITE you might be interested in my side project: https://github.com/chapmanjacobd/library

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Filter table by a large bunch of ids 1733198948  
1590531892 https://github.com/simonw/sqlite-utils/issues/557#issuecomment-1590531892 https://api.github.com/repos/simonw/sqlite-utils/issues/557 IC_kwDOCGYnMM5ezZc0 chapmanjacobd 7908073 2023-06-14T06:09:21Z 2023-06-14T06:09:21Z CONTRIBUTOR

I put together a simple script to upsert and remove duplicate rows based on business keys. If anyone has similar problems with above this might help

``` CREATE TABLE my_table ( id INTEGER PRIMARY KEY, column1 TEXT, column2 TEXT, column3 TEXT );

INSERT INTO my_table (column1, column2, column3) VALUES ('Value 1', 'Duplicate 1', 'Duplicate A'), ('Value 2', 'Duplicate 2', 'Duplicate B'), ('Value 3', 'Duplicate 2', 'Duplicate C'), ('Value 4', 'Duplicate 3', 'Duplicate D'), ('Value 5', 'Duplicate 3', 'Duplicate E'), ('Value 6', 'Duplicate 3', 'Duplicate F'); ```

library dedupe-db test.db my_table --bk column2

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Aliased ROWID option for tables created from alter=True commands 1740150327  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 467.508ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows