home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

14 rows where issue = 455486286 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • simonw 10
  • fgregg 2
  • nileshtrivedi 1
  • izzues 1

author_association 3

  • OWNER 10
  • CONTRIBUTOR 2
  • NONE 2

issue 1

  • Mechanism for turning nested JSON into foreign keys / many-to-many · 14 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1170595021 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-1170595021 https://api.github.com/repos/simonw/sqlite-utils/issues/26 IC_kwDOCGYnMM5FxdzN izzues 60892516 2022-06-29T23:35:29Z 2022-06-29T23:35:29Z NONE

Have you seen MakeTypes? Not the exact same thing but it may be relevant.

And it's inspired by the paper "Types from Data: Making Structured Data First-Class Citizens in F#".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  
1141711418 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-1141711418 https://api.github.com/repos/simonw/sqlite-utils/issues/26 IC_kwDOCGYnMM5EDSI6 nileshtrivedi 19304 2022-05-31T06:21:15Z 2022-05-31T06:21:15Z NONE

I ran into this. My use case has a JSON file with array of book objects with a key called reviews which is also an array of objects. My JSON is human-edited and does not specify IDs for either books or reviews. Because sqlite-utils does not support inserting nested objects, I instead have to maintain two separate CSV files with id column in books.csv and book_id column in reviews.csv.

I think the right way to declare the relationship while inserting a JSON might be to describe the relationship:

sqlite-utils insert data.db books mydata.json --hasmany reviews --hasone author --manytomany tags

This is relying on the assumption that foreign keys can point to rowid primary key.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  
1032120014 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-1032120014 https://api.github.com/repos/simonw/sqlite-utils/issues/26 IC_kwDOCGYnMM49hObO fgregg 536941 2022-02-08T01:32:34Z 2022-02-08T01:32:34Z CONTRIBUTOR

if you are curious about prior art, https://github.com/jsnell/json-to-multicsv is really good!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  
964205475 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-964205475 https://api.github.com/repos/simonw/sqlite-utils/issues/26 IC_kwDOCGYnMM45eJuj fgregg 536941 2021-11-09T14:31:29Z 2021-11-09T14:31:29Z CONTRIBUTOR

i was just reaching for a tool to do this this morning

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  
696566750 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-696566750 https://api.github.com/repos/simonw/sqlite-utils/issues/26 MDEyOklzc3VlQ29tbWVudDY5NjU2Njc1MA== simonw 9599 2020-09-22T07:55:00Z 2020-09-22T07:55:00Z OWNER

Problem: extract means something else now, see #47 and the upcoming work in #42.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  
507051670 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-507051670 https://api.github.com/repos/simonw/sqlite-utils/issues/26 MDEyOklzc3VlQ29tbWVudDUwNzA1MTY3MA== simonw 9599 2019-06-30T17:04:09Z 2019-06-30T17:04:09Z OWNER

I think the implementation of this will benefit from #23 (syntactic sugar for creating m2m records)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  
501541902 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-501541902 https://api.github.com/repos/simonw/sqlite-utils/issues/26 MDEyOklzc3VlQ29tbWVudDUwMTU0MTkwMg== simonw 9599 2019-06-13T04:15:22Z 2019-06-13T16:55:42Z OWNER

So maybe something like this: curl https://api.github.com/repos/simonw/datasette/pulls?state=all | \ sqlite-utils insert git.db pulls - \ --flatten=base \ --flatten=head \ --extract=user:users:id \ --extract=head_repo.license:licenses:key \ --extract=head_repo.owner:users \ --extract=head_repo --extract=base_repo.license:licenses:key \ --extract=base_repo.owner:users \ --extract=base_repo Is the order of those nested --extract lines significant I wonder? It would be nice if the order didn't matter and the code figured out the right execution plan on its own.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  
501543688 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-501543688 https://api.github.com/repos/simonw/sqlite-utils/issues/26 MDEyOklzc3VlQ29tbWVudDUwMTU0MzY4OA== simonw 9599 2019-06-13T04:26:15Z 2019-06-13T04:26:15Z OWNER

I may ignore --flatten for the moment - users can do their own flattening using jq if they need that.

curl https://api.github.com/repos/simonw/datasette/pulls?state=all | jq " [.[] | . + { base_label: .base.label, base_ref: .base.ref, base_sha: .base.sha, base_user: .base.user, base_repo: .base.repo, head_label: .head.label, head_ref: .head.ref, head_sha: .head.sha, head_user: .head.user, head_repo: .head.repo } | del(.base, .head, ._links)] " Output: https://gist.github.com/simonw/2703ed43fcfe96eb8cfeee7b558b61e1

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  
501542025 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-501542025 https://api.github.com/repos/simonw/sqlite-utils/issues/26 MDEyOklzc3VlQ29tbWVudDUwMTU0MjAyNQ== simonw 9599 2019-06-13T04:16:10Z 2019-06-13T04:16:42Z OWNER

So for --extract the format is path-to-property:table-to-extract-to:primary-key

If we find an array (as opposed to a direct nested object) at the end of the dotted path we do a m2m table.

And if primary-key is omitted maybe we do the rowid thing with a foreign key back to ourselves.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  
501539452 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-501539452 https://api.github.com/repos/simonw/sqlite-utils/issues/26 MDEyOklzc3VlQ29tbWVudDUwMTUzOTQ1Mg== simonw 9599 2019-06-13T03:59:32Z 2019-06-13T03:59:32Z OWNER

Another complexity from the https://api.github.com/repos/simonw/datasette/pulls example:

We don't actually want head and base to be pulled out into a separate table. Our ideal table design would probably look something like this:

  • url: ...
  • id: 285698310
  • ...
  • user_id: 9599 -> refs users
  • head_label: simonw:travis-38dev
  • head_ref: travis-38dev
  • head_sha: f274f9004302c5ca75ce89d0abfd648457957e31
  • head_user_id: 9599 -> refs users
  • head_repo_id: 107914493 -> refs repos
  • base_label: simonw:master
  • base_ref: master
  • base_sha: 5e8fbf7f6fbc0b63d0479da3806dd9ccd6aaa945
  • base_user_id: 9599 -> refs users
  • base_repo_id: 107914493 -> refs repos

So the nested head and base sections here, instead of being extracted into another table, were flattened into their own columns.

So perhaps we need a flatten-nested-into-columns mechanism which can be used in conjunction with a extract-to-tables mechanism.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  
501538100 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-501538100 https://api.github.com/repos/simonw/sqlite-utils/issues/26 MDEyOklzc3VlQ29tbWVudDUwMTUzODEwMA== simonw 9599 2019-06-13T03:51:27Z 2019-06-13T03:51:27Z OWNER

I like the term "extract" for what we are doing here, partly because that's the terminology I used in csvs-to-sqlite.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  
501537812 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-501537812 https://api.github.com/repos/simonw/sqlite-utils/issues/26 MDEyOklzc3VlQ29tbWVudDUwMTUzNzgxMg== simonw 9599 2019-06-13T03:49:37Z 2019-06-13T03:50:39Z OWNER

There's an interesting difference here between nested objects with a primary-key style ID and nested objects without.

If a nested object does not have a primary key, we could still shift it out to another table but it would need to be in a context where it has an automatic foreign key back to our current record.

A good example of something where that would be useful is the outageDevices key in https://github.com/simonw/pge-outages/blob/d890d09ff6e2997948028528e06c82e1efe30365/pge-outages.json#L13-L25

json { "outageNumber": "407367", "outageStartTime": "1560355216", "crewCurrentStatus": "PG&E repair crew is on-site working to restore power.", "currentEtor": "1560376800", "cause": "Our preliminary determination is that your outage was caused by scheduled maintenance work.", "estCustAffected": "3", "lastUpdateTime": "1560355709", "hazardFlag": "0", "latitude": "37.35629", "longitude": "-119.70469", "outageDevices": [ { "latitude": "37.35409", "longitude": "-119.70575" }, { "latitude": "37.35463", "longitude": "-119.70525" }, { "latitude": "37.35562", "longitude": "-119.70467" } ], "regionName": "Ahwahnee" }

These could either be inserted into an outageDevices table that uses rowid... or we could have a mechanism where we automatically derive a primary key for them based on a hash of their data, hence avoiding creating duplicates even though we don't have a provided primary key.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  
501536495 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-501536495 https://api.github.com/repos/simonw/sqlite-utils/issues/26 MDEyOklzc3VlQ29tbWVudDUwMTUzNjQ5NQ== simonw 9599 2019-06-13T03:40:21Z 2019-06-13T03:40:21Z OWNER

I think I can do something here with a very simple head.repo.owner path syntax. Normally this kind of syntax would have to take the difference between dictionaries and lists into account but I don't think that matters here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  
501508302 https://github.com/simonw/sqlite-utils/issues/26#issuecomment-501508302 https://api.github.com/repos/simonw/sqlite-utils/issues/26 MDEyOklzc3VlQ29tbWVudDUwMTUwODMwMg== simonw 9599 2019-06-13T00:57:52Z 2019-06-13T00:57:52Z OWNER

Two challenges here:

  1. We need a way to specify which tables should be used - e.g. "put records from the "user" key in a users table, put multiple records from the "labels" key in a table called labels" (we can pick an automatic name for the m2m table, though it might be nice to have an option to customize it)

  2. Should we deal with nested objects? Consider https://api.github.com/repos/simonw/datasette/pulls for example:

Here we have head.user as a user, head.repo as a repo, and head.repo.owner as another user.

Ideally our mechanism for specifying which table things should be pulled out into would handle this, but it's getting a bit complicated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for turning nested JSON into foreign keys / many-to-many 455486286  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 23.802ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows