github: issue_comments: 1 row where author_association = "OWNER", issue = 1091819089 and user = 9599 sorted by updated

1 row where author_association = "OWNER", issue = 1091819089 and user = 9599 sorted by updated_at descending

Search:

✖

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue	performed_via_github_app
1006211113	https://github.com/simonw/sqlite-utils/issues/360#issuecomment-1006211113	https://api.github.com/repos/simonw/sqlite-utils/issues/360	IC_kwDOCGYnMM47-ZAp	simonw 9599	2022-01-06T01:27:53Z	2022-01-06T01:27:53Z	OWNER	It looks like you were using `sqlite-utils memory` - that works by loading the entire file into an in-memory database, so 170GB is very likely to run out of RAM. The line of code there exhibits another problem: it's reading the entire JSON file into a Python string, so it looks like it's going to run out of RAM even before it gets to the SQLite in-memory database section. To handle a file of this size you'd need to write it to a SQLite database on-disk first. The `sqlite-utils insert` command can do this, and it should be able to "stream" records in from a file without loading the entire thing into memory - but only for JSON-NL and CSV/TSV formats, not for JSON arrays. The code in question is here: https://github.com/simonw/sqlite-utils/blob/f3fd8613113d21d44238a6ec54b375f5aa72c4e0/sqlite_utils/cli.py#L738-L773 That's using Python generators for the CSV/TSV/JSON-NL variants... but it's doing this for regular JSON which requires reading the entire thing into memory: https://github.com/simonw/sqlite-utils/blob/f3fd8613113d21d44238a6ec54b375f5aa72c4e0/sqlite_utils/cli.py#L767 If you have the ability to control how your 170GB file is generated you may have more luck converting it to CSV or TSV or newline-delimited JSON, then using `sqlite-utils insert` to insert it into a database file. To be honest though I've never tested this tooling with anything nearly that big, so it's possible you'll still run into problems. If you do I'd love to hear about them! I would be tempted to tackle this size of job by writing a custom Python script, either using the `sqlite_utils` Python library or even calling `sqlite3` directly.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	MemoryError 1091819089

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);