home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

1 row where author_association = "OWNER", issue = 1091819089 and user = 9599 sorted by updated_at descending

✖
✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • simonw · 1 ✖

issue 1

  • MemoryError · 1 ✖

author_association 1

  • OWNER · 1 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1006211113 https://github.com/simonw/sqlite-utils/issues/360#issuecomment-1006211113 https://api.github.com/repos/simonw/sqlite-utils/issues/360 IC_kwDOCGYnMM47-ZAp simonw 9599 2022-01-06T01:27:53Z 2022-01-06T01:27:53Z OWNER

It looks like you were using sqlite-utils memory - that works by loading the entire file into an in-memory database, so 170GB is very likely to run out of RAM.

The line of code there exhibits another problem: it's reading the entire JSON file into a Python string, so it looks like it's going to run out of RAM even before it gets to the SQLite in-memory database section.

To handle a file of this size you'd need to write it to a SQLite database on-disk first. The sqlite-utils insert command can do this, and it should be able to "stream" records in from a file without loading the entire thing into memory - but only for JSON-NL and CSV/TSV formats, not for JSON arrays.

The code in question is here:

https://github.com/simonw/sqlite-utils/blob/f3fd8613113d21d44238a6ec54b375f5aa72c4e0/sqlite_utils/cli.py#L738-L773

That's using Python generators for the CSV/TSV/JSON-NL variants... but it's doing this for regular JSON which requires reading the entire thing into memory:

https://github.com/simonw/sqlite-utils/blob/f3fd8613113d21d44238a6ec54b375f5aa72c4e0/sqlite_utils/cli.py#L767

If you have the ability to control how your 170GB file is generated you may have more luck converting it to CSV or TSV or newline-delimited JSON, then using sqlite-utils insert to insert it into a database file.

To be honest though I've never tested this tooling with anything nearly that big, so it's possible you'll still run into problems. If you do I'd love to hear about them!

I would be tempted to tackle this size of job by writing a custom Python script, either using the sqlite_utils Python library or even calling sqlite3 directly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
MemoryError 1091819089  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1035.757ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows