{"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1030901189", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1030901189, "node_id": "IC_kwDOCGYnMM49ck3F", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-06T19:48:36Z", "updated_at": "2022-02-06T19:48:52Z", "author_association": "OWNER", "body": "From [that thread](https://github.com/simonw/sqlite-utils/issues/399#issuecomment-1030739566), two extra ideas which it may be possible to support in a single implementation:\r\n\r\n```python\r\nfrom sqlite_utils.conversions import LongitudeLatitude\r\n\r\ndb[\"places\"].insert(\r\n {\r\n \"name\": \"London\",\r\n \"lng\": -0.118092,\r\n \"lat\": 51.509865,\r\n },\r\n conversions={\"point\": LongitudeLatitude(\"lng\", \"lat\")},\r\n)\r\n```\r\nAnd\r\n```python\r\ndb[\"places\"].insert(\r\n {\r\n \"name\": \"London\",\r\n \"point\": LongitudeLatitude(-0.118092, 51.509865)\r\n }\r\n)\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1030901853", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1030901853, "node_id": "IC_kwDOCGYnMM49clBd", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-06T19:52:10Z", "updated_at": "2022-02-06T19:52:10Z", "author_association": "OWNER", "body": "So the key idea here is to introduce a new abstract base class, `Conversion`, which has the following abilities:\r\n\r\n- Can wrap one or more Python values (if called using the constructor) such that the `.insert_all()` method knows how to transform those into a format that can be included in an insert - something like `GeomFromText(?, 4326)` with input `POINT(-0.118092 51.509865)`\r\n- Can be passed to `conversions={\"point\": LongitudeLatitude}` in a way that then knows to apply that conversion to every value in the `\"point\"` key of the data being inserted.\r\n- Maybe also extend `conversions=` to allow the definition of additional keys that use as input other rows? That's the `conversions={\"point\": LongitudeLatitude(\"lng\", \"lat\")}` example above - it may not be possible to get this working with the rest of the design though.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1030902102", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1030902102, "node_id": "IC_kwDOCGYnMM49clFW", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-06T19:53:34Z", "updated_at": "2022-02-08T07:40:34Z", "author_association": "OWNER", "body": "I like the idea that the contract for `Conversion` (or rather for its subclasses) is that it can wrap a Python value and then return both the SQL fragment - e.g. `GeomFromText(?, 4326)` - and the values that should be used as the SQL parameters.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1030904948", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1030904948, "node_id": "IC_kwDOCGYnMM49clx0", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-06T20:09:42Z", "updated_at": "2022-02-08T07:40:44Z", "author_association": "OWNER", "body": "I think this is the code that needs to become aware of this system: https://github.com/simonw/sqlite-utils/blob/fea8c9bcc509bcae75e99ae8870f520103b9aa58/sqlite_utils/db.py#L2453-L2469\r\n\r\nThere's an earlier branch that runs for upserts which needs to be modified too: https://github.com/simonw/sqlite-utils/blob/fea8c9bcc509bcae75e99ae8870f520103b9aa58/sqlite_utils/db.py#L2417-L2440", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1031779460", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1031779460, "node_id": "IC_kwDOCGYnMM49f7SE", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-07T18:24:56Z", "updated_at": "2022-02-07T18:24:56Z", "author_association": "CONTRIBUTOR", "body": "I wonder if there's any overlap with the goals here and the `sqlite3` module's concept of adapters and converters: https://docs.python.org/3/library/sqlite3.html#sqlite-and-python-types\r\n\r\nI'm not sure that's _exactly_ what we're talking about here, but it might be a parallel with some useful ideas to borrow.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1031787865", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1031787865, "node_id": "IC_kwDOCGYnMM49f9VZ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-07T18:33:27Z", "updated_at": "2022-02-07T18:33:27Z", "author_association": "OWNER", "body": "Hah, that's interesting - I've never used that mechanism before so it wasn't something that came to mind.\r\n\r\nThey seem to be using a pretty surprising trick there that takes advantage of SQLite allowing you to define a column \"type\" using a made-up type name, which you can then introspect later.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1031791783", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1031791783, "node_id": "IC_kwDOCGYnMM49f-Sn", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-07T18:37:40Z", "updated_at": "2022-02-07T18:37:40Z", "author_association": "CONTRIBUTOR", "body": "I've never used it either, but it's interesting, right? Feel like I should try it for something. \r\n\r\nI'm trying to get my head around how this conversions feature might work, because I really like the idea of it.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1032294365", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1032294365, "node_id": "IC_kwDOCGYnMM49h4_d", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-08T07:32:09Z", "updated_at": "2022-02-08T07:34:41Z", "author_association": "OWNER", "body": "I have an idea for how that third option could work - the one that creates a new column using values from the existing ones:\r\n```python\r\ndb[\"places\"].insert(\r\n {\r\n \"name\": \"London\",\r\n \"lng\": -0.118092,\r\n \"lat\": 51.509865,\r\n },\r\n conversions={\"point\": LongitudeLatitude(\"lng\", \"lat\")},\r\n)\r\n```\r\nHow about specifying that the values in that `conversion=` dictionary can be:\r\n\r\n- A SQL string fragment (as currently implemented)\r\n- A subclass of `Conversion` as described above\r\n- Or... a callable function that takes the row as an argument and returns either a `Conversion` subclass instance or a literal value to be jnserted into the database (a string, int or float)\r\n\r\nThen you could do this:\r\n\r\n```python\r\ndb[\"places\"].insert(\r\n {\r\n \"name\": \"London\",\r\n \"lng\": -0.118092,\r\n \"lat\": 51.509865,\r\n },\r\n conversions={\r\n \"point\": lambda row: LongitudeLatitude(\r\n row[\"lng\"], row[\"lat\"]\r\n )\r\n }\r\n)\r\n```\r\nSomething I really like about this is that it expands the abilities of `conversions=` beyond the slightly obscure need to customize the SQL fragment into something that can solve other data insertion cleanup problems too.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1032296717", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1032296717, "node_id": "IC_kwDOCGYnMM49h5kN", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-08T07:35:46Z", "updated_at": "2022-02-08T07:35:46Z", "author_association": "OWNER", "body": "I'm going to write the documentation for this first, before the implementation, so I can see if it explains cleanly enough that the design appears to be sound.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1032732242", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1032732242, "node_id": "IC_kwDOCGYnMM49jj5S", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-08T15:26:59Z", "updated_at": "2022-02-08T15:26:59Z", "author_association": "CONTRIBUTOR", "body": "What if you did something like this:\r\n\r\n```python\r\n\r\nclass Conversion:\r\n def __init__(self, *args, **kwargs):\r\n \"Put whatever settings you need here\"\r\n\r\n def python(self, row, column, value): # not sure on args here\r\n \"Python step to transform value\"\r\n return value\r\n\r\n def sql(self, row, column, value):\r\n \"Return the actual sql that goes in the insert/update step, and maybe params\"\r\n # value is the return of self.python()\r\n return value, []\r\n```\r\n\r\nThis way, you're always passing an instance, which has methods that do the conversion. (Or you're passing a SQL string, as you would now.) The `__init__` could take column names, or SRID, or whatever other setup state you need per row, but the row is getting processed with the `python` and `sql` methods (or whatever you want to call them). This is pretty rough, so do what you will with names and args and such.\r\n\r\nYou'd then use it like this:\r\n\r\n```python\r\n# subclass might be unneeded here, if methods are present\r\nclass LngLatConversion(Conversion):\r\n def __init__(self, x=\"longitude\", y=\"latitude\"):\r\n self.x = x\r\n self.y = y\r\n\r\n def python(self, row, column, value):\r\n x = row[self.x]\r\n y = row[self.y]\r\n return x, y\r\n\r\n def sql(self, row, column, value):\r\n # value is now a tuple, returned above\r\n s = \"GeomFromText(POINT(? ?))\"\r\n return s, value\r\n\r\ntable.insert_all(rows, conversions={\"point\": LngLatConversion(\"lng\", \"lat\"))}\r\n```\r\n\r\nI haven't thought through all the implementation details here, and it'll probably break in ways I haven't foreseen, but wanted to get this idea out of my head. Hope it helps.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1033366312", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1033366312, "node_id": "IC_kwDOCGYnMM49l-so", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-09T05:28:11Z", "updated_at": "2022-02-09T07:28:48Z", "author_association": "OWNER", "body": "My hunch is that the case where you want to consider input from more than one column will actually be pretty rare - the only case I can think of where I would want to do that is for latitude/longitude columns - everything else that I'd want to use it for (which admittedly is still mostly SpatiaLite stuff) works against a single value.\r\n\r\nThe reason I'm leaning towards using the constructor for the values is that I really like the look of this variant for common conversions:\r\n\r\n```python\r\ndb[\"places\"].insert(\r\n {\r\n \"name\": \"London\",\r\n \"boundary\": GeometryFromGeoJSON({...})\r\n }\r\n)\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1033428967", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1033428967, "node_id": "IC_kwDOCGYnMM49mN_n", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-09T07:25:44Z", "updated_at": "2022-02-09T07:28:11Z", "author_association": "OWNER", "body": "The CLI version of this could perhaps look like this:\r\n\r\n sqlite-utils insert a.db places places.json \\\r\n --conversion boundary GeometryGeoJSON\r\n\r\nThis will treat the boundary key as GeoJSON. It's equivalent to passing `conversions={\"boundary\": geometryGeoJSON}`\r\n\r\nThe combined latitude/longitude case here can be handled by combining this with the existing `--convert` mechanism.\r\n \r\nAny `Conversion` subclass will be available to the CLI in this way.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1035057014", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1035057014, "node_id": "IC_kwDOCGYnMM49sbd2", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-02-10T15:30:28Z", "updated_at": "2022-02-10T15:30:40Z", "author_association": "CONTRIBUTOR", "body": "Yeah, the CLI experience is probably where any kind of multi-column, configured setup is going to fall apart. Sticking with GIS examples, one way I might think about this is using the [fiona CLI](https://fiona.readthedocs.io/en/latest/cli.html):\r\n\r\n```sh\r\n# assuming a database is already created and has SpatiaLite\r\nfio cat boundary.shp | sqlite-utils insert boundaries --conversion geometry GeometryGeoJSON -\r\n```\r\n\r\nAnyway, very interested to see where you land here.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1041325398", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/402", "id": 1041325398, "node_id": "IC_kwDOCGYnMM4-EV1W", "user": {"value": 82988, "label": "psychemedia"}, "created_at": "2022-02-16T10:12:48Z", "updated_at": "2022-02-16T10:18:55Z", "author_association": "NONE", "body": "> My hunch is that the case where you want to consider input from more than one column will actually be pretty rare - the only case I can think of where I would want to do that is for latitude/longitude columns\r\n\r\nOther possible pairs: unconventional date/datetime and timezone pairs eg `2022-02-16::17.00, London`; or more generally, numerical value and unit of measurement pairs (eg if you want to cast into and out of different measurement units using packages like `pint`) or currencies etc. Actually, in that case, I guess you may be presenting things that are unit typed already, and so a conversion would need to parse things into an appropriate, possibly two column `value, unit` format.\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1125297737, "label": "Advanced class-based `conversions=` mechanism"}, "performed_via_github_app": null}