21 rows where issue = 470345929 sorted by updated_at descending

View and edit SQL

Suggested facets: created_at (date), updated_at (date)

user

issue

  • table.extract(...) method and "sqlite-utils extract" command · 21

author_association

id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
697037974 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697037974 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5NzAzNzk3NA== simonw 9599 2020-09-22T23:39:31Z 2020-09-22T23:39:31Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
697031174 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697031174 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5NzAzMTE3NA== simonw 9599 2020-09-22T23:16:00Z 2020-09-22T23:16:00Z OWNER

Trying this demo again:

wget 'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv'
sqlite-utils insert global.db power_plants global_power_plant_database.csv --csv
sqlite-utils extract global.db power_plants country country_long --table countries --rename country_long name

It worked!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
697025403 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697025403 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5NzAyNTQwMw== simonw 9599 2020-09-22T22:57:53Z 2020-09-22T22:57:53Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
697019944 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697019944 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5NzAxOTk0NA== simonw 9599 2020-09-22T22:40:00Z 2020-09-22T22:40:00Z OWNER

I tried out the prototype of the CLI on the Global Power Plants data:

wget 'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv'
sqlite-utils insert global.db power_plants global_power_plant_database.csv --csv
sqlite-utils extract global.db power_plants country country_long

This threw an error because rowid columns are not yet supported. I fixed that like so:

sqlite-utils transform global.db power_plants --rename rowid id
sqlite-utils extract global.db power_plants country country_long

That worked! But it didn't play great with Datasette, because the resulting extracted table had columns country and country_long and neither of those are called name or value or title.

Based on this I need to add rowid table support AND I need to implement the proposed rename= argument for renaming columns on their way into the new table.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
697013681 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697013681 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5NzAxMzY4MQ== simonw 9599 2020-09-22T22:22:49Z 2020-09-22T22:22:49Z OWNER

The command-line version of this needs to accept a table and one or more columns, then a --table and --fk-column option.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
697012111 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697012111 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5NzAxMjExMQ== simonw 9599 2020-09-22T22:18:13Z 2020-09-22T22:18:13Z OWNER

Here's how I'm generating the examples for the documentation:

In [2]: import sqlite_utils

In [3]: db = sqlite_utils.Database(memory=True)

In [4]: db["Trees"].insert({"id": 1, "TreeAddress": "52 Vine St", "CommonName":
   ...: "Palm", "LatinName": "foo"}, pk="id")
Out[4]: <Table Trees (id, TreeAddress, CommonName, LatinName)>

In [5]: db["Trees"].extract(["CommonName", "LatinName"], table="Species", fk_col
   ...: umn="species_id")

In [6]: print(db["Trees"].schema)
CREATE TABLE "Trees" (
   [id] INTEGER PRIMARY KEY,
   [TreeAddress] TEXT,
   [species_id] INTEGER,
   FOREIGN KEY(species_id) REFERENCES Species(id)
)

In [7]: print(db["Species"].schema)
CREATE TABLE [Species] (
   [id] INTEGER PRIMARY KEY,
   [CommonName] TEXT,
   [LatinName] TEXT
)
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696987925 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696987925 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njk4NzkyNQ== simonw 9599 2020-09-22T21:19:04Z 2020-09-22T21:19:04Z OWNER

Need to make sure this works correctly for rowid tables.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696987257 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696987257 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njk4NzI1Nw== simonw 9599 2020-09-22T21:17:34Z 2020-09-22T21:17:34Z OWNER

What to do if the table already exists? The .lookup() function already knows how to modify an existing table to create the correct constraints etc, so I'll rely on that mechanism.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696980709 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696980709 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njk4MDcwOQ== simonw 9599 2020-09-22T21:05:07Z 2020-09-22T21:05:07Z OWNER

So .extract() probably takes a batch_size= argument too, which defaults to maybe 1000.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696980503 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696980503 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njk4MDUwMw== simonw 9599 2020-09-22T21:04:45Z 2020-09-22T21:04:45Z OWNER

table.extract() can take an optional progress= argument which is a callback which will be used to report progress - called after each batch with (num_done, total). It will get called with (0, total) once at the start to allow progress bars to be initialized. The command-line progress bar will use this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696979626 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696979626 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njk3OTYyNg== simonw 9599 2020-09-22T21:03:11Z 2020-09-22T21:03:11Z OWNER

And if you want to rename some of the columns in the new table:

db["trees"].extract(["common_name", "latin_name"], table="species", rename={"common_name": "name"})
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696979168 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696979168 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njk3OTE2OA== simonw 9599 2020-09-22T21:02:24Z 2020-09-22T21:02:24Z OWNER

In Python it looks like this:

# Simple case - species column species_id pointing to species table
db["trees"].extract("species")

# Setting a custom table
db["trees"].extract("species", table="Species")

# Custom foreign key column on trees
db["trees"].extract("species", fk_column="species")

# Extracting multiple columns
db["trees"].extract(["common_name", "latin_name"])
# (this creates a lookup table called common_name_latin_name ref'd by common_name_latin_name_id)

# Or with explicit table (fk_column here defaults to species_id because of the table name)
db["trees"].extract(["common_name", "latin_name"], table="species")
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696976678 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696976678 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njk3NjY3OA== simonw 9599 2020-09-22T20:57:57Z 2020-09-22T20:57:57Z OWNER

I think I understand the shape of this feature now. It lets you specify one or more columns on the source table which will be extracted into another table. It uses the .lookup() mechanism to populate that other table, which means each unique column value / pair / triple will be assigned an integer ID.

That integer ID gets written back into the first of the columns that are being transformed. A .transform() call then converts that column to an integer (and drops the additional columns). Finally we set up the new foreign key relationship.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696893774 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696893774 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njg5Mzc3NA== simonw 9599 2020-09-22T18:15:33Z 2020-09-22T18:15:33Z OWNER

I think the new foreign key column is called company_name_id by default in this example but can be customized by passing --fk-column=xxx

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696893244 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696893244 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5Njg5MzI0NA== simonw 9599 2020-09-22T18:14:33Z 2020-09-22T18:14:45Z OWNER

Thinking more about this one:

$ sqlite-utils extract my.db \
    dea_sales company_name company_address \
    --table companies

The goal here is to pull the company name and address pair out into a separate table.

Some questions:
- should this first verify that every company_name has just one company_address? I like the idea of a unique constraint on the created table for this.
- what should the foreign key column that gets added to the companies table be called?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
513262013 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513262013 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDUxMzI2MjAxMw== simonw 9599 2019-07-19T14:58:23Z 2020-09-22T18:12:11Z OWNER

CLI design idea:

$ sqlite-utils extract my.db \
    dea_sales company_name

Here we just specify the original table and column - the new extracted table will automatically be called "company_name" and will have "id" and "value" columns, by default.

To set a custom extract table:

$ sqlite-utils extract my.db \
    dea_sales company_name \
    --table companies

And for extracting multiple columns and renaming them on the created table, maybe something like this:

$ sqlite-utils extract my.db \
    dea_sales company_name company_address \
    --table companies \
    --column company_name name \
    --column company_address address
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
696567460 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696567460 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5NjU2NzQ2MA== simonw 9599 2020-09-22T07:56:42Z 2020-09-22T07:56:42Z OWNER

.transform() has landed now which should make this a lot easier to solve.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
695698227 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-695698227 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDY5NTY5ODIyNw== simonw 9599 2020-09-20T04:27:26Z 2020-09-20T04:28:26Z OWNER

This is going to need #114 (the transform_table() method) in order to convert string columns into integer foreign key columns.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
513246831 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513246831 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDUxMzI0NjgzMQ== simonw 9599 2019-07-19T14:20:15Z 2019-07-19T14:20:49Z OWNER

Since these operations could take a long time against large tables, it would be neat if there was a progress bar option for the CLI command.

The operations are full table scans so calculating progress shouldn't be too difficult.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
513246124 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513246124 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDUxMzI0NjEyNA== simonw 9599 2019-07-19T14:18:35Z 2019-07-19T14:19:40Z OWNER

How about the Python version? That should be easier to design.

db["dea_sales"].extract(
    columns=["company_name", "company_address"],
    to_table="companies"
)

If we want to transform the extracted data (e.g. rename those columns) maybe support a transform= argument?

db["dea_sales"].extract(
    columns=["company_name", "company_address"],
    to_table="companies",
    transform = lambda extracted: {
        "name": extracted["company_name"],
        "address":  extracted["company_address"],
    }
)

This would create a new "companies" table with three columns: id, name and address.

Would also be nice if there was a syntax for saying "... and use the value from this column as the primary key column in the newly created table".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  
513244121 https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513244121 https://api.github.com/repos/simonw/sqlite-utils/issues/42 MDEyOklzc3VlQ29tbWVudDUxMzI0NDEyMQ== simonw 9599 2019-07-19T14:13:33Z 2019-07-19T14:13:33Z OWNER

So what could the interface to this look like? Especially for the CLI?

One option:

sqlite-utils extract dea_sales company_name companies name

Tricky thing here is that it's quite a large number of positional arguments:

sqlite-utils extract dea_sales company_name companies name
                     Table     column       New table New column (maybe optional?)

It would be great if this could supported multiple columns - for if a spreadsheet has e.g. a “Company Name”, “Company Address” pair of fields that always match each other and areduplicated many times.

This could be handled by creating the new table with two columns that are indexed as a unique compound key. Then you can easily get-or-create on the pairs (or triples or whatever) from the original table.

Challenge here is what does the CLI syntax look like. Something like this?

$ sqlite-utils extract dea_sales -c company_name -c company_address \
  --to companies --to-col name --to-col address

Perhaps the columns in the new table are FORCED to be the same as the old ones, hence avoiding some options? Bit restrictive… maybe they default to the same but you can customize?

$ sqlite-utils extract dea_sales -c company_name -c company_address -t companies
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
table.extract(...) method and "sqlite-utils extract" command 470345929  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);