html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248621072,https://api.github.com/repos/simonw/sqlite-utils/issues/489,1248621072,IC_kwDOCGYnMM5KbHIQ,9599,2022-09-15T20:56:09Z,2022-09-15T20:56:09Z,OWNER,"Prototype so far:
```diff
diff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py
index 767b170..d96c507 100644
--- a/sqlite_utils/cli.py
+++ b/sqlite_utils/cli.py
@@ -1762,6 +1762,17 @@ def query(
is_flag=True,
help=""Analyze resulting tables and output results"",
)
+@click.option(""--key"", help=""read data from this key of the root object"")
+@click.option(
+ ""--auto-key"",
+ is_flag=True,
+ help=""Find a key in the root object that is a list of objects"",
+)
+@click.option(
+ ""--analyze"",
+ is_flag=True,
+ help=""Analyze resulting tables and output results"",
+)
@load_extension_option
def memory(
paths,
@@ -1784,6 +1795,8 @@ def memory(
schema,
dump,
save,
+ key,
+ auto_key,
analyze,
load_extension,
):
@@ -1838,7 +1851,9 @@ def memory(
csv_table = stem
stem_counts[stem] = stem_counts.get(stem, 1) + 1
csv_fp = csv_path.open(""rb"")
- rows, format_used = rows_from_file(csv_fp, format=format, encoding=encoding)
+ rows, format_used = rows_from_file(
+ csv_fp, format=format, encoding=encoding, key=key, auto_key=auto_key
+ )
tracker = None
if format_used in (Format.CSV, Format.TSV) and not no_detect_types:
tracker = TypeTracker()
diff --git a/sqlite_utils/utils.py b/sqlite_utils/utils.py
index 8754554..2e69c26 100644
--- a/sqlite_utils/utils.py
+++ b/sqlite_utils/utils.py
@@ -231,6 +231,8 @@ def rows_from_file(
encoding: Optional[str] = None,
ignore_extras: Optional[bool] = False,
extras_key: Optional[str] = None,
+ key: Optional[str] = None,
+ auto_key: Optional[bool] = False,
) -> Tuple[Iterable[dict], Format]:
""""""
Load a sequence of dictionaries from a file-like object containing one of four different formats.
@@ -271,13 +273,31 @@ def rows_from_file(
:param encoding: the character encoding to use when reading CSV/TSV data
:param ignore_extras: ignore any extra fields on rows
:param extras_key: put any extra fields in a list with this key
+ :param key: read data from this key of the root object
+ :param auto_key: find a key in the root object that is a list of objects
""""""
if ignore_extras and extras_key:
raise ValueError(""Cannot use ignore_extras= and extras_key= together"")
+ if key and auto_key:
+ raise ValueError(""Cannot use key= and auto_key= together"")
if format == Format.JSON:
decoded = json.load(fp)
if isinstance(decoded, dict):
- decoded = [decoded]
+ if auto_key:
+ list_keys = [
+ k
+ for k in decoded
+ if isinstance(decoded[k], list)
+ and decoded[k]
+ and all(isinstance(o, dict) for o in decoded[k])
+ ]
+ if len(list_keys) == 1:
+ decoded = decoded[list_keys[0]]
+ elif key:
+ # Raises KeyError, I think that's OK
+ decoded = decoded[key]
+ if not isinstance(decoded, list):
+ decoded = [decoded]
if not isinstance(decoded, list):
raise RowsFromFileBadJSON(""JSON must be a list or a dictionary"")
return decoded, Format.JSON
@@ -305,7 +325,9 @@ def rows_from_file(
first_bytes = buffered.peek(2048).strip()
if first_bytes.startswith(b""["") or first_bytes.startswith(b""{""):
# TODO: Detect newline-JSON
- return rows_from_file(buffered, format=Format.JSON)
+ return rows_from_file(
+ buffered, format=Format.JSON, key=key, auto_key=auto_key
+ )
else:
dialect = csv.Sniffer().sniff(
first_bytes.decode(encoding or ""utf-8-sig"", ""ignore"")
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1374939463,
https://github.com/simonw/sqlite-utils/pull/486#issuecomment-1248591268,https://api.github.com/repos/simonw/sqlite-utils/issues/486,1248591268,IC_kwDOCGYnMM5Ka_2k,9599,2022-09-15T20:36:02Z,2022-09-15T20:40:03Z,OWNER,"I had a big CSV file lying around, I converted it to other formats like this:
sqlite-utils insert /tmp/t.db t /tmp/en.openfoodfacts.org.products.csv --csv
sqlite-utils rows /tmp/t.db t --nl > /tmp/big.nl
sqlite-utils rows /tmp/t.db t > /tmp/big.json
Then tested the progress bar like this:
sqlite-utils insert /tmp/t2.db t /tmp/big.nl --nl
Output:
```
sqlite-utils insert /tmp/t2.db t /tmp/big.nl --nl
[------------------------------------] 0%
[#######-----------------------------] 20% 00:00:20
```
With `--silent` it is silent.
And for regular JSON:
```
sqlite-utils insert /tmp/t3.db t /tmp/big.json
[####################################] 100%
```
This is actually not doing the right thing. The problem is that `sqlite-utils` doesn't include a streaming JSON parser, so it instead reads that entire JSON file into memory first (exhausting the progress bar to 100% instantly) and then does the rest of the work in-memory while the bar sticks at 100%.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1366512990,
https://github.com/simonw/sqlite-utils/issues/485#issuecomment-1248597643,https://api.github.com/repos/simonw/sqlite-utils/issues/485,1248597643,IC_kwDOCGYnMM5KbBaL,9599,2022-09-15T20:39:39Z,2022-09-15T20:39:52Z,OWNER,"A note from PR #486: https://github.com/simonw/sqlite-utils/issues/486#issuecomment-1248591268_
> ```
> sqlite-utils insert /tmp/t3.db t /tmp/big.json
> [####################################] 100%
> ```
> This is actually not doing the right thing. The problem is that `sqlite-utils` doesn't include a streaming JSON parser, so it instead reads that entire JSON file into memory first (exhausting the progress bar to 100% instantly) and then does the rest of the work in-memory while the bar sticks at 100%.
I decided to land this anyway. If a streaming JSON parser is added later it will start to work.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1366423176,
https://github.com/simonw/sqlite-utils/pull/486#issuecomment-1248593835,https://api.github.com/repos/simonw/sqlite-utils/issues/486,1248593835,IC_kwDOCGYnMM5KbAer,9599,2022-09-15T20:37:14Z,2022-09-15T20:37:14Z,OWNER,"I'm going to land this anyway. The lack of a streaming JSON parser is a separate issue, I don't think it should block landing this improvement.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1366512990,
https://github.com/simonw/sqlite-utils/pull/486#issuecomment-1248582147,https://api.github.com/repos/simonw/sqlite-utils/issues/486,1248582147,IC_kwDOCGYnMM5Ka9oD,9599,2022-09-15T20:29:17Z,2022-09-15T20:29:17Z,OWNER,This looks good to me. I need to run some manual tests before merging (it's a good sign that the automated tests pass though).,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1366512990,
https://github.com/simonw/sqlite-utils/pull/486#issuecomment-1248568775,https://api.github.com/repos/simonw/sqlite-utils/issues/486,1248568775,IC_kwDOCGYnMM5Ka6XH,9599,2022-09-15T20:16:14Z,2022-09-15T20:16:14Z,OWNER,"https://github.com/actions/setup-python/blob/main/docs/advanced-usage.md#using-the-python-version-input says can set the full version:
```
- uses: actions/setup-python@v4
with:
python-version: ""3.10.6""
```
I'll try that.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1366512990,
https://github.com/simonw/sqlite-utils/pull/486#issuecomment-1248567323,https://api.github.com/repos/simonw/sqlite-utils/issues/486,1248567323,IC_kwDOCGYnMM5Ka6Ab,9599,2022-09-15T20:14:45Z,2022-09-15T20:14:45Z,OWNER,"There's a fix for `mypy` that has landed but isn't out in a release yet:
- https://github.com/python/mypy/issues/13385
For the moment looks like pinning to Python 3.10.6 could help. Need to figure out how to do that in GitHub Actions though.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1366512990,
https://github.com/simonw/sqlite-utils/pull/486#issuecomment-1248565396,https://api.github.com/repos/simonw/sqlite-utils/issues/486,1248565396,IC_kwDOCGYnMM5Ka5iU,9599,2022-09-15T20:12:50Z,2022-09-15T20:12:50Z,OWNER,"Annoying `mypy` test failure:
```
/Users/runner/hostedtoolcache/Python/3.10.7/x64/lib/python3.10/site-packages/numpy/__init__.pyi:636:
error: Positional-only parameters are only supported in Python 3.8 and greater
```
Looks like this:
- https://github.com/python/mypy/issues/13627","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1366512990,
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248522618,https://api.github.com/repos/simonw/sqlite-utils/issues/489,1248522618,IC_kwDOCGYnMM5KavF6,9599,2022-09-15T19:29:20Z,2022-09-15T19:29:20Z,OWNER,I think refactoring `sqlite-utils insert` to use `rows_from_file` needs to happen as part of this work.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1374939463,
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248512739,https://api.github.com/repos/simonw/sqlite-utils/issues/489,1248512739,IC_kwDOCGYnMM5Kasrj,9599,2022-09-15T19:18:24Z,2022-09-15T19:21:01Z,OWNER,"Why doesn't `sqlite-utils insert` use the `rows_from_file` function I wonder?
https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864207841 says:
> I can refactor `sqlite-utils insert` to use this new code too.
Maybe I forgot to do that?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1374939463,
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248501824,https://api.github.com/repos/simonw/sqlite-utils/issues/489,1248501824,IC_kwDOCGYnMM5KaqBA,9599,2022-09-15T19:10:48Z,2022-09-15T19:10:48Z,OWNER,"This feels pretty good:
```
% sqlite-utils memory ~/Downloads/CVR_Export_20220908084311/*.json --schema --auto-key
CREATE TABLE [BallotTypeContestManifest] (
[BallotTypeId] INTEGER,
[ContestId] INTEGER
);
CREATE VIEW t1 AS select * from [BallotTypeContestManifest];
CREATE VIEW t AS select * from [BallotTypeContestManifest];
CREATE TABLE [BallotTypeManifest] (
[Description] TEXT,
[Id] INTEGER,
[ExternalId] TEXT
);
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1374939463,
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248484094,https://api.github.com/repos/simonw/sqlite-utils/issues/489,1248484094,IC_kwDOCGYnMM5Kalr-,9599,2022-09-15T18:56:31Z,2022-09-15T18:56:31Z,OWNER,"Actually I quite like `--key X` - it could work for single nested objects too. You could insert a single record like this:
```json
{
""record"" {
""id"": 1
}
}
```
```
sqlite-utils insert db.db records record.json --key record
``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1374939463,
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248481303,https://api.github.com/repos/simonw/sqlite-utils/issues/489,1248481303,IC_kwDOCGYnMM5KalAX,9599,2022-09-15T18:54:30Z,2022-09-15T18:55:14Z,OWNER,"Maybe this would make more sense as a mechanism where you can say ""Use the data in the key called X"" - but there's a special option for ""figure out that key automatically"".
The syntax then could be:
`--list-key List`
Or for automatic detection:
`--list-key-auto`
Could also go with `--key List` and `--key-auto` - but would that be as obvious as `--list-key`?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1374939463,
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248479485,https://api.github.com/repos/simonw/sqlite-utils/issues/489,1248479485,IC_kwDOCGYnMM5Kakj9,9599,2022-09-15T18:52:52Z,2022-09-15T18:53:45Z,OWNER,"The most similar option I have at the moment is probably `--flatten`. What would good names for this option be?
- `--auto-list`
- `--auto-key`
- `--inner-key`
- `--auto-json`
- `--find-list`
- `--find-key`
Those are all bad.
Another option: introduce a new explicit format for it. Right now the explicit formats you can use are:
https://github.com/simonw/sqlite-utils/blob/d9b9e075f07a20f1137cd2e34ed5d3f1a3db4ad8/docs/cli-reference.rst#L153-L158
So I could add a `:autojson` format.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1374939463,
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248475718,https://api.github.com/repos/simonw/sqlite-utils/issues/489,1248475718,IC_kwDOCGYnMM5KajpG,9599,2022-09-15T18:49:05Z,2022-09-15T18:49:53Z,OWNER,"Here's how I used my prototype to build [that Gist](https://gist.github.com/simonw/0e6901974a14ab7d56c2746a04d72c8c):
sqlite-utils memory ~/Downloads/CVR_Export_20220908084311/*.json --schema > database.sql
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1374939463,
https://github.com/simonw/sqlite-utils/issues/489#issuecomment-1248474806,https://api.github.com/repos/simonw/sqlite-utils/issues/489,1248474806,IC_kwDOCGYnMM5Kaja2,9599,2022-09-15T18:48:09Z,2022-09-15T18:48:09Z,OWNER,"Built a prototype of this that works really well:
```diff
diff --git a/sqlite_utils/utils.py b/sqlite_utils/utils.py
index c0b7bf1..f9a482c 100644
--- a/sqlite_utils/utils.py
+++ b/sqlite_utils/utils.py
@@ -272,7 +272,19 @@ def rows_from_file(
if format == Format.JSON:
decoded = json.load(fp)
if isinstance(decoded, dict):
- decoded = [decoded]
+ # TODO: Solve for if this isn't what people want
+ # Does it have just one key that is a list of dicts?
+ list_keys = [
+ k
+ for k in decoded
+ if isinstance(decoded[k], list)
+ and decoded[k]
+ and all(isinstance(o, dict) for o in decoded[k])
+ ]
+ if len(list_keys) == 1:
+ decoded = decoded[list_keys[0]]
+ else:
+ decoded = [decoded]
if not isinstance(decoded, list):
raise RowsFromFileBadJSON(""JSON must be a list or a dictionary"")
return decoded, Format.JSON
```
I used that to build this: https://gist.github.com/simonw/0e6901974a14ab7d56c2746a04d72c8c
One problem though: right now, if you do this `sqlite-utils` treats it as a single object and adds a `tags` column with JSON in it:
```
echo '{""title"": ""Hi"", ""tags"": [{""t"": ""one""}]}` | sqlite-utils insert db.db t -
```
If I implement this new mechanism the above line would behave differently - which would be a backwards incompatible change.
So I probably need some kind of opt-in mechanism for this. And I need a good name for it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1374939463,
https://github.com/simonw/datasette/issues/1810#issuecomment-1248290151,https://api.github.com/repos/simonw/datasette/issues/1810,1248290151,IC_kwDOBm6k_c5KZ2Vn,9599,2022-09-15T15:51:04Z,2022-09-15T15:51:25Z,OWNER,I could prototype this idea as a `datasette-featured-tables` plugin that delivers its own custom `index.html` template.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1374626873,
https://github.com/simonw/datasette/issues/1810#issuecomment-1248289857,https://api.github.com/repos/simonw/datasette/issues/1810,1248289857,IC_kwDOBm6k_c5KZ2RB,9599,2022-09-15T15:50:46Z,2022-09-15T15:50:46Z,OWNER,"Idea: allow the user to specify one or more featured tables. Each table is then shown as a summary on the homepage - with the total number of rows and the first 5 rows. If the table has search configured there's a search box too.
If the instance has only one database with only one table (excluding hidden tables) it gets featured automatically perhaps (maybe with a way to opt-out of that if you want to).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1374626873,
https://github.com/simonw/datasette/issues/1810#issuecomment-1248187089,https://api.github.com/repos/simonw/datasette/issues/1810,1248187089,IC_kwDOBm6k_c5KZdLR,9599,2022-09-15T14:31:36Z,2022-09-15T14:31:36Z,OWNER,Twitter conversation that inspired this issue: https://twitter.com/psychemedia/status/1570410108785684481,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1374626873,