home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 1006211113

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions issue performed_via_github_app
https://github.com/simonw/sqlite-utils/issues/360#issuecomment-1006211113 https://api.github.com/repos/simonw/sqlite-utils/issues/360 1006211113 IC_kwDOCGYnMM47-ZAp 9599 2022-01-06T01:27:53Z 2022-01-06T01:27:53Z OWNER

It looks like you were using sqlite-utils memory - that works by loading the entire file into an in-memory database, so 170GB is very likely to run out of RAM.

The line of code there exhibits another problem: it's reading the entire JSON file into a Python string, so it looks like it's going to run out of RAM even before it gets to the SQLite in-memory database section.

To handle a file of this size you'd need to write it to a SQLite database on-disk first. The sqlite-utils insert command can do this, and it should be able to "stream" records in from a file without loading the entire thing into memory - but only for JSON-NL and CSV/TSV formats, not for JSON arrays.

The code in question is here:

https://github.com/simonw/sqlite-utils/blob/f3fd8613113d21d44238a6ec54b375f5aa72c4e0/sqlite_utils/cli.py#L738-L773

That's using Python generators for the CSV/TSV/JSON-NL variants... but it's doing this for regular JSON which requires reading the entire thing into memory:

https://github.com/simonw/sqlite-utils/blob/f3fd8613113d21d44238a6ec54b375f5aa72c4e0/sqlite_utils/cli.py#L767

If you have the ability to control how your 170GB file is generated you may have more luck converting it to CSV or TSV or newline-delimited JSON, then using sqlite-utils insert to insert it into a database file.

To be honest though I've never tested this tooling with anything nearly that big, so it's possible you'll still run into problems. If you do I'd love to hear about them!

I would be tempted to tackle this size of job by writing a custom Python script, either using the sqlite_utils Python library or even calling sqlite3 directly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
1091819089  
Powered by Datasette · Queries took 0.767ms · About: github-to-sqlite