{"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513246831", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 513246831, "node_id": "MDEyOklzc3VlQ29tbWVudDUxMzI0NjgzMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-07-19T14:20:15Z", "updated_at": "2019-07-19T14:20:49Z", "author_association": "OWNER", "body": "Since these operations could take a long time against large tables, it would be neat if there was a progress bar option for the CLI command.\r\n\r\nThe operations are full table scans so calculating progress shouldn't be too difficult.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513246124", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 513246124, "node_id": "MDEyOklzc3VlQ29tbWVudDUxMzI0NjEyNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-07-19T14:18:35Z", "updated_at": "2019-07-19T14:19:40Z", "author_association": "OWNER", "body": "How about the Python version? That should be easier to design.\r\n\r\n```python\r\ndb[\"dea_sales\"].extract(\r\n columns=[\"company_name\", \"company_address\"],\r\n to_table=\"companies\"\r\n)\r\n```\r\nIf we want to transform the extracted data (e.g. rename those columns) maybe support a `transform=` argument?\r\n\r\n```python\r\ndb[\"dea_sales\"].extract(\r\n columns=[\"company_name\", \"company_address\"],\r\n to_table=\"companies\",\r\n transform = lambda extracted: {\r\n \"name\": extracted[\"company_name\"],\r\n \"address\": extracted[\"company_address\"],\r\n }\r\n)\r\n```\r\nThis would create a new \"companies\" table with three columns: id, name and address.\r\n\r\nWould also be nice if there was a syntax for saying \"... and use the value from this column as the primary key column in the newly created table\".", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/42#issuecomment-513244121", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/42", "id": 513244121, "node_id": "MDEyOklzc3VlQ29tbWVudDUxMzI0NDEyMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-07-19T14:13:33Z", "updated_at": "2019-07-19T14:13:33Z", "author_association": "OWNER", "body": "So what could the interface to this look like? Especially for the CLI?\r\n\r\nOne option:\r\n\r\n sqlite-utils extract dea_sales company_name companies name\r\n\r\nTricky thing here is that it's quite a large number of positional arguments:\r\n\r\n sqlite-utils extract dea_sales company_name companies name\r\n Table column New table New column (maybe optional?)\r\n\r\nIt would be great if this could supported multiple columns - for if a spreadsheet has e.g. a \u201cCompany Name\u201d, \u201cCompany Address\u201d pair of fields that always match each other and areduplicated many times.\r\n\r\nThis could be handled by creating the new table with two columns that are indexed as a unique compound key. Then you can easily get-or-create on the pairs (or triples or whatever) from the original table.\r\n\r\nChallenge here is what does the CLI syntax look like. Something like this?\r\n\r\n $ sqlite-utils extract dea_sales -c company_name -c company_address \\\r\n --to companies --to-col name --to-col address\r\n\r\nPerhaps the columns in the new table are FORCED to be the same as the old ones, hence avoiding some options? Bit restrictive\u2026 maybe they default to the same but you can customize?\r\n\r\n $ sqlite-utils extract dea_sales -c company_name -c company_address -t companies", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 470345929, "label": "table.extract(...) method and \"sqlite-utils extract\" command"}, "performed_via_github_app": null}