{"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697037974", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 697037974, "node_id": "MDEyOklzc3VlQ29tbWVudDY5NzAzNzk3NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T23:39:31Z", "updated_at": "2020-09-22T23:39:31Z", "author_association": "OWNER", "body": "Documentation for `sqlite-utils extract`: https://sqlite-utils.readthedocs.io/en/latest/cli.html#extracting-columns-into-a-separate-table", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697031174", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 697031174, "node_id": "MDEyOklzc3VlQ29tbWVudDY5NzAzMTE3NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T23:16:00Z", "updated_at": "2020-09-22T23:16:00Z", "author_association": "OWNER", "body": "Trying this demo again:\r\n```\r\nwget 'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv'\r\nsqlite-utils insert global.db power_plants global_power_plant_database.csv --csv\r\nsqlite-utils extract global.db power_plants country country_long --table countries --rename country_long name\r\n```\r\nIt worked!", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697025403", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 697025403, "node_id": "MDEyOklzc3VlQ29tbWVudDY5NzAyNTQwMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T22:57:53Z", "updated_at": "2020-09-22T22:57:53Z", "author_association": "OWNER", "body": "The documentation for the `.extract()` method is here: https://sqlite-utils.readthedocs.io/en/latest/python-api.html#extracting-columns-into-a-separate-table", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697019944", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 697019944, "node_id": "MDEyOklzc3VlQ29tbWVudDY5NzAxOTk0NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T22:40:00Z", "updated_at": "2020-09-22T22:40:00Z", "author_association": "OWNER", "body": "I tried out the prototype of the CLI on the Global Power Plants data:\r\n```\r\nwget 'https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv'\r\nsqlite-utils insert global.db power_plants global_power_plant_database.csv --csv\r\nsqlite-utils extract global.db power_plants country country_long\r\n```\r\nThis threw an error because `rowid` columns are not yet supported. I fixed that like so:\r\n```\r\nsqlite-utils transform global.db power_plants --rename rowid id\r\nsqlite-utils extract global.db power_plants country country_long\r\n```\r\nThat worked! But it didn't play great with Datasette, because the resulting extracted table had columns `country` and `country_long` and neither of those are called `name` or `value` or `title`.\r\n\r\nBased on this I need to add `rowid` table support AND I need to implement the proposed `rename=` argument for renaming columns on their way into the new table.\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697013681", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 697013681, "node_id": "MDEyOklzc3VlQ29tbWVudDY5NzAxMzY4MQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T22:22:49Z", "updated_at": "2020-09-22T22:22:49Z", "author_association": "OWNER", "body": "The command-line version of this needs to accept a table and one or more columns, then a `--table` and `--fk-column` option.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-697012111", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 697012111, "node_id": "MDEyOklzc3VlQ29tbWVudDY5NzAxMjExMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T22:18:13Z", "updated_at": "2020-09-22T22:18:13Z", "author_association": "OWNER", "body": "Here's how I'm generating the examples for the documentation:\r\n```\r\nIn [2]: import sqlite_utils\r\n\r\nIn [3]: db = sqlite_utils.Database(memory=True)\r\n\r\nIn [4]: db[\"Trees\"].insert({\"id\": 1, \"TreeAddress\": \"52 Vine St\", \"CommonName\":\r\n ...: \"Palm\", \"LatinName\": \"foo\"}, pk=\"id\")\r\nOut[4]: \r\n\r\nIn [5]: db[\"Trees\"].extract([\"CommonName\", \"LatinName\"], table=\"Species\", fk_col\r\n ...: umn=\"species_id\")\r\n\r\nIn [6]: print(db[\"Trees\"].schema)\r\nCREATE TABLE \"Trees\" (\r\n [id] INTEGER PRIMARY KEY,\r\n [TreeAddress] TEXT,\r\n [species_id] INTEGER,\r\n FOREIGN KEY(species_id) REFERENCES Species(id)\r\n)\r\n\r\nIn [7]: print(db[\"Species\"].schema)\r\nCREATE TABLE [Species] (\r\n [id] INTEGER PRIMARY KEY,\r\n [CommonName] TEXT,\r\n [LatinName] TEXT\r\n)\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696987925", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 696987925, "node_id": "MDEyOklzc3VlQ29tbWVudDY5Njk4NzkyNQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T21:19:04Z", "updated_at": "2020-09-22T21:19:04Z", "author_association": "OWNER", "body": "Need to make sure this works correctly for `rowid` tables.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696987257", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 696987257, "node_id": "MDEyOklzc3VlQ29tbWVudDY5Njk4NzI1Nw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T21:17:34Z", "updated_at": "2020-09-22T21:17:34Z", "author_association": "OWNER", "body": "What to do if the table already exists? The `.lookup()` function already knows how to modify an existing table to create the correct constraints etc, so I'll rely on that mechanism.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696980709", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 696980709, "node_id": "MDEyOklzc3VlQ29tbWVudDY5Njk4MDcwOQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T21:05:07Z", "updated_at": "2020-09-22T21:05:07Z", "author_association": "OWNER", "body": "So `.extract()` probably takes a `batch_size=` argument too, which defaults to maybe 1000.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696980503", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 696980503, "node_id": "MDEyOklzc3VlQ29tbWVudDY5Njk4MDUwMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T21:04:45Z", "updated_at": "2020-09-22T21:04:45Z", "author_association": "OWNER", "body": "`table.extract()` can take an optional `progress=` argument which is a callback which will be used to report progress - called after each batch with `(num_done, total)`. It will get called with `(0, total)` once at the start to allow progress bars to be initialized. The command-line progress bar will use this.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696979626", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 696979626, "node_id": "MDEyOklzc3VlQ29tbWVudDY5Njk3OTYyNg==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T21:03:11Z", "updated_at": "2020-09-22T21:03:11Z", "author_association": "OWNER", "body": "And if you want to rename some of the columns in the new table:\r\n```python\r\ndb[\"trees\"].extract([\"common_name\", \"latin_name\"], table=\"species\", rename={\"common_name\": \"name\"})\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696979168", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 696979168, "node_id": "MDEyOklzc3VlQ29tbWVudDY5Njk3OTE2OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T21:02:24Z", "updated_at": "2020-09-22T21:02:24Z", "author_association": "OWNER", "body": "In Python it looks like this:\r\n```python\r\n# Simple case - species column species_id pointing to species table\r\ndb[\"trees\"].extract(\"species\")\r\n\r\n# Setting a custom table\r\ndb[\"trees\"].extract(\"species\", table=\"Species\")\r\n\r\n# Custom foreign key column on trees\r\ndb[\"trees\"].extract(\"species\", fk_column=\"species\")\r\n\r\n# Extracting multiple columns\r\ndb[\"trees\"].extract([\"common_name\", \"latin_name\"])\r\n# (this creates a lookup table called common_name_latin_name ref'd by common_name_latin_name_id)\r\n\r\n# Or with explicit table (fk_column here defaults to species_id because of the table name)\r\ndb[\"trees\"].extract([\"common_name\", \"latin_name\"], table=\"species\")\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696976678", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 696976678, "node_id": "MDEyOklzc3VlQ29tbWVudDY5Njk3NjY3OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T20:57:57Z", "updated_at": "2020-09-22T20:57:57Z", "author_association": "OWNER", "body": "I think I understand the shape of this feature now. It lets you specify one or more columns on the source table which will be extracted into another table. It uses the `.lookup()` mechanism to populate that other table, which means each unique column value / pair / triple will be assigned an integer ID.\r\n\r\nThat integer ID gets written back into the first of the columns that are being transformed. A `.transform()` call then converts that column to an integer (and drops the additional columns). Finally we set up the new foreign key relationship.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696893774", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 696893774, "node_id": "MDEyOklzc3VlQ29tbWVudDY5Njg5Mzc3NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T18:15:33Z", "updated_at": "2020-09-22T18:15:33Z", "author_association": "OWNER", "body": "I think the new foreign key column is called `company_name_id` by default in this example but can be customized by passing `--fk-column=xxx`", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696893244", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 696893244, "node_id": "MDEyOklzc3VlQ29tbWVudDY5Njg5MzI0NA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T18:14:33Z", "updated_at": "2020-09-22T18:14:45Z", "author_association": "OWNER", "body": "Thinking more about this one:\r\n```\r\n$ sqlite-utils extract my.db \\\r\n dea_sales company_name company_address \\\r\n --table companies\r\n```\r\nThe goal here is to pull the company name and address pair out into a separate table.\r\n\r\nSome questions:\r\n- should this first verify that every company_name has just one company_address? I like the idea of a unique constraint on the created table for this.\r\n- what should the foreign key column that gets added to the `companies` table be called?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513262013", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 513262013, "node_id": "MDEyOklzc3VlQ29tbWVudDUxMzI2MjAxMw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-07-19T14:58:23Z", "updated_at": "2020-09-22T18:12:11Z", "author_association": "OWNER", "body": "CLI design idea:\r\n\r\n $ sqlite-utils extract my.db \\\r\n dea_sales company_name\r\n\r\nHere we just specify the original table and column - the new extracted table will automatically be called \"company_name\" and will have \"id\" and \"value\" columns, by default.\r\n\r\nTo set a custom extract table:\r\n\r\n $ sqlite-utils extract my.db \\\r\n dea_sales company_name \\\r\n --table companies\r\n\r\nAnd for extracting multiple columns and renaming them on the created table, maybe something like this:\r\n\r\n $ sqlite-utils extract my.db \\\r\n dea_sales company_name company_address \\\r\n --table companies \\\r\n --column company_name name \\\r\n --column company_address address\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-696567460", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 696567460, "node_id": "MDEyOklzc3VlQ29tbWVudDY5NjU2NzQ2MA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-09-22T07:56:42Z", "updated_at": "2020-09-22T07:56:42Z", "author_association": "OWNER", "body": "`.transform()` has landed now which should make this a lot easier to solve.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null}