html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513244121,https://api.github.com/repos/simonw/sqlite-utils/issues/42,513244121,MDEyOklzc3VlQ29tbWVudDUxMzI0NDEyMQ==,9599,2019-07-19T14:13:33Z,2019-07-19T14:13:33Z,OWNER,"So what could the interface to this look like? Especially for the CLI? One option: sqlite-utils extract dea_sales company_name companies name Tricky thing here is that it's quite a large number of positional arguments: sqlite-utils extract dea_sales company_name companies name Table column New table New column (maybe optional?) It would be great if this could supported multiple columns - for if a spreadsheet has e.g. a “Company Name”, “Company Address” pair of fields that always match each other and areduplicated many times. This could be handled by creating the new table with two columns that are indexed as a unique compound key. Then you can easily get-or-create on the pairs (or triples or whatever) from the original table. Challenge here is what does the CLI syntax look like. Something like this? $ sqlite-utils extract dea_sales -c company_name -c company_address \ --to companies --to-col name --to-col address Perhaps the columns in the new table are FORCED to be the same as the old ones, hence avoiding some options? Bit restrictive… maybe they default to the same but you can customize? $ sqlite-utils extract dea_sales -c company_name -c company_address -t companies","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513246124,https://api.github.com/repos/simonw/sqlite-utils/issues/42,513246124,MDEyOklzc3VlQ29tbWVudDUxMzI0NjEyNA==,9599,2019-07-19T14:18:35Z,2019-07-19T14:19:40Z,OWNER,"How about the Python version? That should be easier to design. ```python db[""dea_sales""].extract( columns=[""company_name"", ""company_address""], to_table=""companies"" ) ``` If we want to transform the extracted data (e.g. rename those columns) maybe support a `transform=` argument? ```python db[""dea_sales""].extract( columns=[""company_name"", ""company_address""], to_table=""companies"", transform = lambda extracted: { ""name"": extracted[""company_name""], ""address"": extracted[""company_address""], } ) ``` This would create a new ""companies"" table with three columns: id, name and address. Would also be nice if there was a syntax for saying ""... and use the value from this column as the primary key column in the newly created table"".","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513246831,https://api.github.com/repos/simonw/sqlite-utils/issues/42,513246831,MDEyOklzc3VlQ29tbWVudDUxMzI0NjgzMQ==,9599,2019-07-19T14:20:15Z,2019-07-19T14:20:49Z,OWNER,"Since these operations could take a long time against large tables, it would be neat if there was a progress bar option for the CLI command. The operations are full table scans so calculating progress shouldn't be too difficult.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513262013,https://api.github.com/repos/simonw/sqlite-utils/issues/42,513262013,MDEyOklzc3VlQ29tbWVudDUxMzI2MjAxMw==,9599,2019-07-19T14:58:23Z,2020-09-22T18:12:11Z,OWNER,"CLI design idea: $ sqlite-utils extract my.db \ dea_sales company_name Here we just specify the original table and column - the new extracted table will automatically be called ""company_name"" and will have ""id"" and ""value"" columns, by default. To set a custom extract table: $ sqlite-utils extract my.db \ dea_sales company_name \ --table companies And for extracting multiple columns and renaming them on the created table, maybe something like this: $ sqlite-utils extract my.db \ dea_sales company_name company_address \ --table companies \ --column company_name name \ --column company_address address ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-695698227,https://api.github.com/repos/simonw/sqlite-utils/issues/42,695698227,MDEyOklzc3VlQ29tbWVudDY5NTY5ODIyNw==,9599,2020-09-20T04:27:26Z,2020-09-20T04:28:26Z,OWNER,This is going to need #114 (the `transform_table()` method) in order to convert string columns into integer foreign key columns.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696567460,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696567460,MDEyOklzc3VlQ29tbWVudDY5NjU2NzQ2MA==,9599,2020-09-22T07:56:42Z,2020-09-22T07:56:42Z,OWNER,`.transform()` has landed now which should make this a lot easier to solve.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696893244,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696893244,MDEyOklzc3VlQ29tbWVudDY5Njg5MzI0NA==,9599,2020-09-22T18:14:33Z,2020-09-22T18:14:45Z,OWNER,"Thinking more about this one: ``` $ sqlite-utils extract my.db \ dea_sales company_name company_address \ --table companies ``` The goal here is to pull the company name and address pair out into a separate table. Some questions: - should this first verify that every company_name has just one company_address? I like the idea of a unique constraint on the created table for this. - what should the foreign key column that gets added to the `companies` table be called?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696893774,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696893774,MDEyOklzc3VlQ29tbWVudDY5Njg5Mzc3NA==,9599,2020-09-22T18:15:33Z,2020-09-22T18:15:33Z,OWNER,I think the new foreign key column is called `company_name_id` by default in this example but can be customized by passing `--fk-column=xxx`,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696976678,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696976678,MDEyOklzc3VlQ29tbWVudDY5Njk3NjY3OA==,9599,2020-09-22T20:57:57Z,2020-09-22T20:57:57Z,OWNER,"I think I understand the shape of this feature now. It lets you specify one or more columns on the source table which will be extracted into another table. It uses the `.lookup()` mechanism to populate that other table, which means each unique column value / pair / triple will be assigned an integer ID. That integer ID gets written back into the first of the columns that are being transformed. A `.transform()` call then converts that column to an integer (and drops the additional columns). Finally we set up the new foreign key relationship.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696979168,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696979168,MDEyOklzc3VlQ29tbWVudDY5Njk3OTE2OA==,9599,2020-09-22T21:02:24Z,2020-09-22T21:02:24Z,OWNER,"In Python it looks like this: ```python # Simple case - species column species_id pointing to species table db[""trees""].extract(""species"") # Setting a custom table db[""trees""].extract(""species"", table=""Species"") # Custom foreign key column on trees db[""trees""].extract(""species"", fk_column=""species"") # Extracting multiple columns db[""trees""].extract([""common_name"", ""latin_name""]) # (this creates a lookup table called common_name_latin_name ref'd by common_name_latin_name_id) # Or with explicit table (fk_column here defaults to species_id because of the table name) db[""trees""].extract([""common_name"", ""latin_name""], table=""species"") ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696979626,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696979626,MDEyOklzc3VlQ29tbWVudDY5Njk3OTYyNg==,9599,2020-09-22T21:03:11Z,2020-09-22T21:03:11Z,OWNER,"And if you want to rename some of the columns in the new table: ```python db[""trees""].extract([""common_name"", ""latin_name""], table=""species"", rename={""common_name"": ""name""}) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696980503,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696980503,MDEyOklzc3VlQ29tbWVudDY5Njk4MDUwMw==,9599,2020-09-22T21:04:45Z,2020-09-22T21:04:45Z,OWNER,"`table.extract()` can take an optional `progress=` argument which is a callback which will be used to report progress - called after each batch with `(num_done, total)`. It will get called with `(0, total)` once at the start to allow progress bars to be initialized. The command-line progress bar will use this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696980709,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696980709,MDEyOklzc3VlQ29tbWVudDY5Njk4MDcwOQ==,9599,2020-09-22T21:05:07Z,2020-09-22T21:05:07Z,OWNER,"So `.extract()` probably takes a `batch_size=` argument too, which defaults to maybe 1000.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696987257,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696987257,MDEyOklzc3VlQ29tbWVudDY5Njk4NzI1Nw==,9599,2020-09-22T21:17:34Z,2020-09-22T21:17:34Z,OWNER,"What to do if the table already exists? The `.lookup()` function already knows how to modify an existing table to create the correct constraints etc, so I'll rely on that mechanism.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696987925,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696987925,MDEyOklzc3VlQ29tbWVudDY5Njk4NzkyNQ==,9599,2020-09-22T21:19:04Z,2020-09-22T21:19:04Z,OWNER,Need to make sure this works correctly for `rowid` tables.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697012111,https://api.github.com/repos/simonw/sqlite-utils/issues/42,697012111,MDEyOklzc3VlQ29tbWVudDY5NzAxMjExMQ==,9599,2020-09-22T22:18:13Z,2020-09-22T22:18:13Z,OWNER,"Here's how I'm generating the examples for the documentation: ``` In [2]: import sqlite_utils In [3]: db = sqlite_utils.Database(memory=True) In [4]: db[""Trees""].insert({""id"": 1, ""TreeAddress"": ""52 Vine St"", ""CommonName"": ...: ""Palm"", ""LatinName"": ""foo""}, pk=""id"") Out[4]: In [5]: db[""Trees""].extract([""CommonName"", ""LatinName""], table=""Species"", fk_col ...: umn=""species_id"") In [6]: print(db[""Trees""].schema) CREATE TABLE ""Trees"" ( [id] INTEGER PRIMARY KEY, [TreeAddress] TEXT, [species_id] INTEGER, FOREIGN KEY(species_id) REFERENCES Species(id) ) In [7]: print(db[""Species""].schema) CREATE TABLE [Species] ( [id] INTEGER PRIMARY KEY, [CommonName] TEXT, [LatinName] TEXT ) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697013681,https://api.github.com/repos/simonw/sqlite-utils/issues/42,697013681,MDEyOklzc3VlQ29tbWVudDY5NzAxMzY4MQ==,9599,2020-09-22T22:22:49Z,2020-09-22T22:22:49Z,OWNER,"The command-line version of this needs to accept a table and one or more columns, then a `--table` and `--fk-column` option.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697019944,https://api.github.com/repos/simonw/sqlite-utils/issues/42,697019944,MDEyOklzc3VlQ29tbWVudDY5NzAxOTk0NA==,9599,2020-09-22T22:40:00Z,2020-09-22T22:40:00Z,OWNER,"I tried out the prototype of the CLI on the Global Power Plants data: ``` wget 'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv' sqlite-utils insert global.db power_plants global_power_plant_database.csv --csv sqlite-utils extract global.db power_plants country country_long ``` This threw an error because `rowid` columns are not yet supported. I fixed that like so: ``` sqlite-utils transform global.db power_plants --rename rowid id sqlite-utils extract global.db power_plants country country_long ``` That worked! But it didn't play great with Datasette, because the resulting extracted table had columns `country` and `country_long` and neither of those are called `name` or `value` or `title`. Based on this I need to add `rowid` table support AND I need to implement the proposed `rename=` argument for renaming columns on their way into the new table. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697025403,https://api.github.com/repos/simonw/sqlite-utils/issues/42,697025403,MDEyOklzc3VlQ29tbWVudDY5NzAyNTQwMw==,9599,2020-09-22T22:57:53Z,2020-09-22T22:57:53Z,OWNER,The documentation for the `.extract()` method is here: https://sqlite-utils.readthedocs.io/en/latest/python-api.html#extracting-columns-into-a-separate-table,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697031174,https://api.github.com/repos/simonw/sqlite-utils/issues/42,697031174,MDEyOklzc3VlQ29tbWVudDY5NzAzMTE3NA==,9599,2020-09-22T23:16:00Z,2020-09-22T23:16:00Z,OWNER,"Trying this demo again: ``` wget 'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv' sqlite-utils insert global.db power_plants global_power_plant_database.csv --csv sqlite-utils extract global.db power_plants country country_long --table countries --rename country_long name ``` It worked!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929, https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697037974,https://api.github.com/repos/simonw/sqlite-utils/issues/42,697037974,MDEyOklzc3VlQ29tbWVudDY5NzAzNzk3NA==,9599,2020-09-22T23:39:31Z,2020-09-22T23:39:31Z,OWNER,Documentation for `sqlite-utils extract`: https://sqlite-utils.readthedocs.io/en/latest/cli.html#extracting-columns-into-a-separate-table,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,