22 rows where issue = 921878733 sorted by updated_at descending

View and edit SQL

Suggested facets: reactions, created_at (date), updated_at (date)

user

author_association

issue

  • Idea: import CSV to memory, run SQL, export in a single command · 22
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
864476167 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-864476167 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2NDQ3NjE2Nw== simonw 9599 2021-06-19T23:36:48Z 2021-06-19T23:36:48Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
864101267 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-864101267 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2NDEwMTI2Nw== simonw 9599 2021-06-18T15:01:41Z 2021-06-18T15:01:41Z OWNER

I'll split the remaining work out into separate issues.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862491016 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862491016 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjQ5MTAxNg== simonw 9599 2021-06-16T15:46:13Z 2021-06-16T15:46:13Z OWNER

Columns from data imported from CSV in this way is currently treated as TEXT, which means numeric sorts and suchlike won't work as people might expect. It would be good to do automatic type detection here, see #179.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862485408 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862485408 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjQ4NTQwOA== simonw 9599 2021-06-16T15:38:58Z 2021-06-16T15:39:28Z OWNER

Also sqlite-utils memory reflects the existing sqlite-utils :memory: mechanism, which is a point in its favour.

And it helps emphasize that the file you are querying will be loaded into memory, so probably don't try this against a 1GB CSV file.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862484557 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862484557 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjQ4NDU1Nw== simonw 9599 2021-06-16T15:37:51Z 2021-06-16T15:38:34Z OWNER

I wonder if there's a better name for this than sqlite-utils memory?

  • sqlite-utils memory hello.csv "select * from hello"
  • sqlite-utils mem hello.csv "select * from hello"
  • sqlite-utils temp hello.csv "select * from hello"
  • sqlite-utils adhoc hello.csv "select * from hello"
  • sqlite-utils scratch hello.csv "select * from hello"

I think memory is best. I don't like the others, except for scratch which is OK.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862479704 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862479704 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjQ3OTcwNA== simonw 9599 2021-06-16T15:31:31Z 2021-06-16T15:31:31Z OWNER

Plus, could I make this change to sqlite-utils query without breaking backwards compatibility? Adding a new sqlite-utils memory command is completely safe from that perspective.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862478881 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862478881 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjQ3ODg4MQ== simonw 9599 2021-06-16T15:30:24Z 2021-06-16T15:30:24Z OWNER

But... sqlite-utils my.csv "select * from my" is a much more compelling initial experience than sqlite-utils memory my.csv "select * from my".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862475685 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862475685 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjQ3NTY4NQ== simonw 9599 2021-06-16T15:26:19Z 2021-06-16T15:29:38Z OWNER

Here's a radical idea: what if I combined sqlite-utils memory into sqlite-utils query?

The trick here would be to detect if the arguments passed on the command-line refer to SQLite databases or if they refer to CSV/JSON data that should be imported into temporary tables.

Detecting a SQLite database file is actually really easy - they all start with the same binary string:

>>> open("my.db", "rb").read(100)
b'SQLite format 3\x00...

(Need to carefully check that a CSV file withSQLite format 3 as the first column name doesn't accidentally get interpreted as a SQLite DB though).

So then what would the semantics of sqlite-utils query (which is also the default command) be?

  • sqlite-utils mydb.db "select * from x"
  • sqlite-utils my.csv "select * from my"
  • sqlite-utils mydb.db my.csv "select * from mydb.x join my on ..." - this is where it gets weird. We can't import the CSV data directly into mpdb.db - it's suppose to go into the in-memory database - so now we need to start using database aliases like mydb.x because we passed at least one other file?

The complexity here is definitely in the handling of a combination of SQLite database files and CSV filenames. Also, sqlite-utils query doesn't accept multiple filenames at the moment, so that will change.

I'm not 100% sold on this as being better than having a separate sqlite-utils memory command, as seen in #273.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862040971 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862040971 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjA0MDk3MQ== simonw 9599 2021-06-16T05:02:56Z 2021-06-16T05:02:56Z OWNER

Moving this to a PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862040906 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862040906 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjA0MDkwNg== simonw 9599 2021-06-16T05:02:47Z 2021-06-16T05:02:47Z OWNER

Got a prototype working!

 % curl -s 'https://fivethirtyeight.datasettes.com/polls/president_approval_polls.csv?_size=max&_stream=1' | sqlite-utils memory - 'select * from t limit 5' --nl 
{"rowid": "1", "question_id": "139304", "poll_id": "74225", "state": "", "politician_id": "11", "politician": "Donald Trump", "pollster_id": "568", "pollster": "YouGov", "sponsor_ids": "352", "sponsors": "Economist", "display_name": "YouGov", "pollster_rating_id": "391", "pollster_rating_name": "YouGov", "fte_grade": "B", "sample_size": "1500", "population": "a", "population_full": "a", "methodology": "Online", "start_date": "1/16/21", "end_date": "1/19/21", "sponsor_candidate": "", "tracking": "", "created_at": "1/20/21 10:18", "notes": "", "url": "https://docs.cdn.yougov.com/y9zsit5bzd/weeklytrackingreport.pdf", "source": "538", "yes": "42.0", "no": "53.0"}
{"rowid": "2", "question_id": "139305", "poll_id": "74225", "state": "", "politician_id": "11", "politician": "Donald Trump", "pollster_id": "568", "pollster": "YouGov", "sponsor_ids": "352", "sponsors": "Economist", "display_name": "YouGov", "pollster_rating_id": "391", "pollster_rating_name": "YouGov", "fte_grade": "B", "sample_size": "1155", "population": "rv", "population_full": "rv", "methodology": "Online", "start_date": "1/16/21", "end_date": "1/19/21", "sponsor_candidate": "", "tracking": "", "created_at": "1/20/21 10:18", "notes": "", "url": "https://docs.cdn.yougov.com/y9zsit5bzd/weeklytrackingreport.pdf", "source": "538", "yes": "44.0", "no": "55.0"}
{"rowid": "3", "question_id": "139306", "poll_id": "74226", "state": "", "politician_id": "11", "politician": "Donald Trump", "pollster_id": "23", "pollster": "American Research Group", "sponsor_ids": "", "sponsors": "", "display_name": "American Research Group", "pollster_rating_id": "9", "pollster_rating_name": "American Research Group", "fte_grade": "B", "sample_size": "1100", "population": "a", "population_full": "a", "methodology": "Live Phone", "start_date": "1/16/21", "end_date": "1/19/21", "sponsor_candidate": "", "tracking": "", "created_at": "1/20/21 10:18", "notes": "", "url": "https://americanresearchgroup.com/economy/", "source": "538", "yes": "30.0", "no": "66.0"}
{"rowid": "4", "question_id": "139307", "poll_id": "74226", "state": "", "politician_id": "11", "politician": "Donald Trump", "pollster_id": "23", "pollster": "American Research Group", "sponsor_ids": "", "sponsors": "", "display_name": "American Research Group", "pollster_rating_id": "9", "pollster_rating_name": "American Research Group", "fte_grade": "B", "sample_size": "990", "population": "rv", "population_full": "rv", "methodology": "Live Phone", "start_date": "1/16/21", "end_date": "1/19/21", "sponsor_candidate": "", "tracking": "", "created_at": "1/20/21 10:18", "notes": "", "url": "https://americanresearchgroup.com/economy/", "source": "538", "yes": "29.0", "no": "67.0"}
{"rowid": "5", "question_id": "139298", "poll_id": "74224", "state": "", "politician_id": "11", "politician": "Donald Trump", "pollster_id": "1528", "pollster": "AtlasIntel", "sponsor_ids": "", "sponsors": "", "display_name": "AtlasIntel", "pollster_rating_id": "546", "pollster_rating_name": "AtlasIntel", "fte_grade": "B/C", "sample_size": "5188", "population": "a", "population_full": "a", "methodology": "Online", "start_date": "1/15/21", "end_date": "1/19/21", "sponsor_candidate": "", "tracking": "", "created_at": "1/19/21 21:52", "notes": "", "url": "https://projects.fivethirtyeight.com/polls/20210119_US_Atlas2.pdf", "source": "538", "yes": "44.6", "no": "53.9"}
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
862018937 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-862018937 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MjAxODkzNw== simonw 9599 2021-06-16T03:59:28Z 2021-06-16T04:00:05Z OWNER

Mainly for debugging purposes it would be useful to be able to save the created in-memory database back to a file again later. This could be done with:

sqlite-utils memory blah.csv --save saved.db

Can use .iterdump() to implement this: https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.iterdump

Maybe instead (or as-well-as) offer --dump which dumps out the SQL from that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861989987 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861989987 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTk4OTk4Nw== simonw 9599 2021-06-16T02:34:21Z 2021-06-16T02:34:21Z OWNER

The documentation already covers this

$ sqlite-utils :memory: "select sqlite_version()"
[{"sqlite_version()": "3.29.0"}]

https://sqlite-utils.datasette.io/en/latest/cli.html#running-queries-and-returning-json

sqlite-utils memory "select sqlite_version()" is a little bit more intuitive than that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861987651 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861987651 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTk4NzY1MQ== simonw 9599 2021-06-16T02:27:20Z 2021-06-16T02:27:20Z OWNER

Solution: sqlite-utils memory - attempts to detect the input based on if it starts with a { or [ (likely JSON) or if it doesn't use the csv.Sniffer() mechanism. Or you can use sqlite-utils memory -:csv to specifically indicate the type of input.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861985944 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861985944 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTk4NTk0NA== simonw 9599 2021-06-16T02:22:52Z 2021-06-16T02:22:52Z OWNER

Another option: allow an optional :suffix specifying the type of the file. If this is missing we detect based on the filename.

sqlite-utils memory somefile:csv "select * from somefile"

One catch: how to treat - for standard input?

cat blah.csv | sqlite-utils memory - "select * from stdin"

That's fine for CSV, but what about TSV or JSON or nl-JSON? Maybe this:

cat blah.csv | sqlite-utils memory -:json "select * from stdin"

Bit weird though. The alternative would be to support this:

cat blah.csv | sqlite-utils memory --load-csv -

But that's verbose compared to the version without the long --load-x option.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861984707 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861984707 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTk4NDcwNw== simonw 9599 2021-06-16T02:19:48Z 2021-06-16T02:19:48Z OWNER

This is going to need to be a separate command, for relatively non-obvious reasons.

sqlite-utils blah.db "select * from x"

Is equivalent to this, because query is the default sub-command:

sqlite-utils query blah.db "select * from x"

But... this means that making the filename optional doesn't actually work - because then this is ambiguous:

sqlite-utils --load-csv blah.csv "select * from blah"

So instead, I'm going to add a new sub-command. I'm currently thinking memory to reflect that this command operates on an in-memory database:

sqlite-utils memory --load-csv blah.csv "select * from blah"

I still think I need to use --load-csv rather than --csv because one interesting use-case for this is loading in CSV and converting it to JSON, or vice-versa.

Another option: allow multiple arguments which are filenames, and use the extension (or sniff the content) to decide what to do with them:

sqlite-utils memory blah.csv foo.csv "select * from foo join blah on ..."

This would require the last positional argument to always be a SQL query, and would treat all other positional arguments as files that should be imported into memory.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861944202 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861944202 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTk0NDIwMg== eyeseast 25778 2021-06-16T01:41:03Z 2021-06-16T01:41:03Z NONE

So, I do things like this a lot, too. I like the idea of piping in from stdin. Something like this would be nice to do in a makefile:

cat file.csv | sqlite-utils --csv --table data - 'SELECT * FROM data WHERE col="whatever"' > filtered.csv

If you assumed that you're always piping out the same format you're piping in, the option names don't have to change. Depends how much you want to change formats.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861891835 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861891835 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTg5MTgzNQ== simonw 9599 2021-06-15T23:09:31Z 2021-06-15T23:09:31Z OWNER

--load-csv and --load-json and --load-nl and --load-tsv are unambiguous.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861891693 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861891693 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTg5MTY5Mw== simonw 9599 2021-06-15T23:09:08Z 2021-06-15T23:09:08Z OWNER

Problem: --csv and --json and --nl are already options for sqlite-utils query - need new non-conflicting names.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861891272 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861891272 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTg5MTI3Mg== simonw 9599 2021-06-15T23:08:02Z 2021-06-15T23:08:02Z OWNER

--csv - should work though, for reading from stdin. The table can be called stdin.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861891110 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861891110 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTg5MTExMA== simonw 9599 2021-06-15T23:07:38Z 2021-06-15T23:07:38Z OWNER

--csvt seems unnecessary to me: if people want to load different CSV files with the same filename (but in different directories) they will get an error unless they rename the files first.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861890689 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861890689 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTg5MDY4OQ== simonw 9599 2021-06-15T23:06:37Z 2021-06-15T23:06:37Z OWNER

How about --json and --nl and --tsv too? Imitating the format options for sqlite-utils insert.

And what happens if you provide a filename too? I'm tempted to say that the --csv stuff still gets loaded into an in-memory database but it's given a name and can then be joined against using SQLite memory.blah syntax.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  
861889437 https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861889437 https://api.github.com/repos/simonw/sqlite-utils/issues/272 MDEyOklzc3VlQ29tbWVudDg2MTg4OTQzNw== simonw 9599 2021-06-15T23:03:26Z 2021-06-15T23:03:26Z OWNER

Maybe also support --csvt as an alternative option which takes two arguments: the CSV path and the name of the table that should be created from it (rather than auto-detecting from the filename).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Idea: import CSV to memory, run SQL, export in a single command 921878733  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);