html_url,issue_url,id,node_id,user,user_label,created_at,updated_at,author_association,body,reactions,issue,issue_label,performed_via_github_app https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513244121,https://api.github.com/repos/simonw/sqlite-utils/issues/42,513244121,MDEyOklzc3VlQ29tbWVudDUxMzI0NDEyMQ==,9599,simonw,2019-07-19T14:13:33Z,2019-07-19T14:13:33Z,OWNER,"So what could the interface to this look like? Especially for the CLI? One option: sqlite-utils extract dea_sales company_name companies name Tricky thing here is that it's quite a large number of positional arguments: sqlite-utils extract dea_sales company_name companies name Table column New table New column (maybe optional?) It would be great if this could supported multiple columns - for if a spreadsheet has e.g. a “Company Name”, “Company Address” pair of fields that always match each other and areduplicated many times. This could be handled by creating the new table with two columns that are indexed as a unique compound key. Then you can easily get-or-create on the pairs (or triples or whatever) from the original table. Challenge here is what does the CLI syntax look like. Something like this? $ sqlite-utils extract dea_sales -c company_name -c company_address \ --to companies --to-col name --to-col address Perhaps the columns in the new table are FORCED to be the same as the old ones, hence avoiding some options? Bit restrictive… maybe they default to the same but you can customize? $ sqlite-utils extract dea_sales -c company_name -c company_address -t companies","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513246124,https://api.github.com/repos/simonw/sqlite-utils/issues/42,513246124,MDEyOklzc3VlQ29tbWVudDUxMzI0NjEyNA==,9599,simonw,2019-07-19T14:18:35Z,2019-07-19T14:19:40Z,OWNER,"How about the Python version? That should be easier to design. ```python db[""dea_sales""].extract( columns=[""company_name"", ""company_address""], to_table=""companies"" ) ``` If we want to transform the extracted data (e.g. rename those columns) maybe support a `transform=` argument? ```python db[""dea_sales""].extract( columns=[""company_name"", ""company_address""], to_table=""companies"", transform = lambda extracted: { ""name"": extracted[""company_name""], ""address"": extracted[""company_address""], } ) ``` This would create a new ""companies"" table with three columns: id, name and address. Would also be nice if there was a syntax for saying ""... and use the value from this column as the primary key column in the newly created table"".","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513246831,https://api.github.com/repos/simonw/sqlite-utils/issues/42,513246831,MDEyOklzc3VlQ29tbWVudDUxMzI0NjgzMQ==,9599,simonw,2019-07-19T14:20:15Z,2019-07-19T14:20:49Z,OWNER,"Since these operations could take a long time against large tables, it would be neat if there was a progress bar option for the CLI command. The operations are full table scans so calculating progress shouldn't be too difficult.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513262013,https://api.github.com/repos/simonw/sqlite-utils/issues/42,513262013,MDEyOklzc3VlQ29tbWVudDUxMzI2MjAxMw==,9599,simonw,2019-07-19T14:58:23Z,2020-09-22T18:12:11Z,OWNER,"CLI design idea: $ sqlite-utils extract my.db \ dea_sales company_name Here we just specify the original table and column - the new extracted table will automatically be called ""company_name"" and will have ""id"" and ""value"" columns, by default. To set a custom extract table: $ sqlite-utils extract my.db \ dea_sales company_name \ --table companies And for extracting multiple columns and renaming them on the created table, maybe something like this: $ sqlite-utils extract my.db \ dea_sales company_name company_address \ --table companies \ --column company_name name \ --column company_address address ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-695698227,https://api.github.com/repos/simonw/sqlite-utils/issues/42,695698227,MDEyOklzc3VlQ29tbWVudDY5NTY5ODIyNw==,9599,simonw,2020-09-20T04:27:26Z,2020-09-20T04:28:26Z,OWNER,This is going to need #114 (the `transform_table()` method) in order to convert string columns into integer foreign key columns.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696567460,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696567460,MDEyOklzc3VlQ29tbWVudDY5NjU2NzQ2MA==,9599,simonw,2020-09-22T07:56:42Z,2020-09-22T07:56:42Z,OWNER,`.transform()` has landed now which should make this a lot easier to solve.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696893244,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696893244,MDEyOklzc3VlQ29tbWVudDY5Njg5MzI0NA==,9599,simonw,2020-09-22T18:14:33Z,2020-09-22T18:14:45Z,OWNER,"Thinking more about this one: ``` $ sqlite-utils extract my.db \ dea_sales company_name company_address \ --table companies ``` The goal here is to pull the company name and address pair out into a separate table. Some questions: - should this first verify that every company_name has just one company_address? I like the idea of a unique constraint on the created table for this. - what should the foreign key column that gets added to the `companies` table be called?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696893774,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696893774,MDEyOklzc3VlQ29tbWVudDY5Njg5Mzc3NA==,9599,simonw,2020-09-22T18:15:33Z,2020-09-22T18:15:33Z,OWNER,I think the new foreign key column is called `company_name_id` by default in this example but can be customized by passing `--fk-column=xxx`,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696976678,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696976678,MDEyOklzc3VlQ29tbWVudDY5Njk3NjY3OA==,9599,simonw,2020-09-22T20:57:57Z,2020-09-22T20:57:57Z,OWNER,"I think I understand the shape of this feature now. It lets you specify one or more columns on the source table which will be extracted into another table. It uses the `.lookup()` mechanism to populate that other table, which means each unique column value / pair / triple will be assigned an integer ID. That integer ID gets written back into the first of the columns that are being transformed. A `.transform()` call then converts that column to an integer (and drops the additional columns). Finally we set up the new foreign key relationship.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696979168,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696979168,MDEyOklzc3VlQ29tbWVudDY5Njk3OTE2OA==,9599,simonw,2020-09-22T21:02:24Z,2020-09-22T21:02:24Z,OWNER,"In Python it looks like this: ```python # Simple case - species column species_id pointing to species table db[""trees""].extract(""species"") # Setting a custom table db[""trees""].extract(""species"", table=""Species"") # Custom foreign key column on trees db[""trees""].extract(""species"", fk_column=""species"") # Extracting multiple columns db[""trees""].extract([""common_name"", ""latin_name""]) # (this creates a lookup table called common_name_latin_name ref'd by common_name_latin_name_id) # Or with explicit table (fk_column here defaults to species_id because of the table name) db[""trees""].extract([""common_name"", ""latin_name""], table=""species"") ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696979626,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696979626,MDEyOklzc3VlQ29tbWVudDY5Njk3OTYyNg==,9599,simonw,2020-09-22T21:03:11Z,2020-09-22T21:03:11Z,OWNER,"And if you want to rename some of the columns in the new table: ```python db[""trees""].extract([""common_name"", ""latin_name""], table=""species"", rename={""common_name"": ""name""}) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696980503,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696980503,MDEyOklzc3VlQ29tbWVudDY5Njk4MDUwMw==,9599,simonw,2020-09-22T21:04:45Z,2020-09-22T21:04:45Z,OWNER,"`table.extract()` can take an optional `progress=` argument which is a callback which will be used to report progress - called after each batch with `(num_done, total)`. It will get called with `(0, total)` once at the start to allow progress bars to be initialized. The command-line progress bar will use this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696980709,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696980709,MDEyOklzc3VlQ29tbWVudDY5Njk4MDcwOQ==,9599,simonw,2020-09-22T21:05:07Z,2020-09-22T21:05:07Z,OWNER,"So `.extract()` probably takes a `batch_size=` argument too, which defaults to maybe 1000.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696987257,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696987257,MDEyOklzc3VlQ29tbWVudDY5Njk4NzI1Nw==,9599,simonw,2020-09-22T21:17:34Z,2020-09-22T21:17:34Z,OWNER,"What to do if the table already exists? The `.lookup()` function already knows how to modify an existing table to create the correct constraints etc, so I'll rely on that mechanism.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696987925,https://api.github.com/repos/simonw/sqlite-utils/issues/42,696987925,MDEyOklzc3VlQ29tbWVudDY5Njk4NzkyNQ==,9599,simonw,2020-09-22T21:19:04Z,2020-09-22T21:19:04Z,OWNER,Need to make sure this works correctly for `rowid` tables.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697012111,https://api.github.com/repos/simonw/sqlite-utils/issues/42,697012111,MDEyOklzc3VlQ29tbWVudDY5NzAxMjExMQ==,9599,simonw,2020-09-22T22:18:13Z,2020-09-22T22:18:13Z,OWNER,"Here's how I'm generating the examples for the documentation: ``` In [2]: import sqlite_utils In [3]: db = sqlite_utils.Database(memory=True) In [4]: db[""Trees""].insert({""id"": 1, ""TreeAddress"": ""52 Vine St"", ""CommonName"": ...: ""Palm"", ""LatinName"": ""foo""}, pk=""id"") Out[4]: In [5]: db[""Trees""].extract([""CommonName"", ""LatinName""], table=""Species"", fk_col ...: umn=""species_id"") In [6]: print(db[""Trees""].schema) CREATE TABLE ""Trees"" ( [id] INTEGER PRIMARY KEY, [TreeAddress] TEXT, [species_id] INTEGER, FOREIGN KEY(species_id) REFERENCES Species(id) ) In [7]: print(db[""Species""].schema) CREATE TABLE [Species] ( [id] INTEGER PRIMARY KEY, [CommonName] TEXT, [LatinName] TEXT ) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697013681,https://api.github.com/repos/simonw/sqlite-utils/issues/42,697013681,MDEyOklzc3VlQ29tbWVudDY5NzAxMzY4MQ==,9599,simonw,2020-09-22T22:22:49Z,2020-09-22T22:22:49Z,OWNER,"The command-line version of this needs to accept a table and one or more columns, then a `--table` and `--fk-column` option.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697019944,https://api.github.com/repos/simonw/sqlite-utils/issues/42,697019944,MDEyOklzc3VlQ29tbWVudDY5NzAxOTk0NA==,9599,simonw,2020-09-22T22:40:00Z,2020-09-22T22:40:00Z,OWNER,"I tried out the prototype of the CLI on the Global Power Plants data: ``` wget 'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv' sqlite-utils insert global.db power_plants global_power_plant_database.csv --csv sqlite-utils extract global.db power_plants country country_long ``` This threw an error because `rowid` columns are not yet supported. I fixed that like so: ``` sqlite-utils transform global.db power_plants --rename rowid id sqlite-utils extract global.db power_plants country country_long ``` That worked! But it didn't play great with Datasette, because the resulting extracted table had columns `country` and `country_long` and neither of those are called `name` or `value` or `title`. Based on this I need to add `rowid` table support AND I need to implement the proposed `rename=` argument for renaming columns on their way into the new table. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697025403,https://api.github.com/repos/simonw/sqlite-utils/issues/42,697025403,MDEyOklzc3VlQ29tbWVudDY5NzAyNTQwMw==,9599,simonw,2020-09-22T22:57:53Z,2020-09-22T22:57:53Z,OWNER,The documentation for the `.extract()` method is here: https://sqlite-utils.readthedocs.io/en/latest/python-api.html#extracting-columns-into-a-separate-table,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697031174,https://api.github.com/repos/simonw/sqlite-utils/issues/42,697031174,MDEyOklzc3VlQ29tbWVudDY5NzAzMTE3NA==,9599,simonw,2020-09-22T23:16:00Z,2020-09-22T23:16:00Z,OWNER,"Trying this demo again: ``` wget 'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv' sqlite-utils insert global.db power_plants global_power_plant_database.csv --csv sqlite-utils extract global.db power_plants country country_long --table countries --rename country_long name ``` It worked!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command", https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697037974,https://api.github.com/repos/simonw/sqlite-utils/issues/42,697037974,MDEyOklzc3VlQ29tbWVudDY5NzAzNzk3NA==,9599,simonw,2020-09-22T23:39:31Z,2020-09-22T23:39:31Z,OWNER,Documentation for `sqlite-utils extract`: https://sqlite-utils.readthedocs.io/en/latest/cli.html#extracting-columns-into-a-separate-table,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",470345929,"table.extract(...) method and ""sqlite-utils extract"" command",