{"html_url": "https://github.com/simonw/sqlite-utils/pull/333#issuecomment-974754412", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/333", "id": 974754412, "node_id": "IC_kwDOCGYnMM46GZJs", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-11-21T04:35:32Z", "updated_at": "2021-11-21T04:35:32Z", "author_association": "OWNER", "body": "Some other recent projects (like trying to get this library to work in JupyterLite) have made me much more cautious about adding new dependencies, especially dependencies like `pyarrow` which require custom C/Rust extensions.\r\n\r\nThere are a few ways this could work though:\r\n\r\n- Have this as an optional dependency feature - so it only works if the user installs `pyarrow` as well\r\n- Implement this as a separate tool, `parquet-to-sqlite` - which could itself depend on `sqlite-utils`\r\n- Add a concept of \"plugins\" to `sqlite-utils`, similar to how those work in Datasette: https://docs.datasette.io/en/stable/plugins.html\r\n\r\nMy favourite option is `parquet-to-sqlite` because that can be built without any additional changes to `sqlite-utils` at all!\r\n\r\nI find the concept of plugins for `sqlite-utils` interesting. I've so far not had quite enough potential use-cases to convince me this is worthwhile (especially since it should be very easy to build out separate tools entirely), but I'm ready to be convinced that a plugin mechanism would be worthwhile.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1039037439, "label": "Add functionality to read Parquet files."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/333#issuecomment-979345527", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/333", "id": 979345527, "node_id": "IC_kwDOCGYnMM46X6B3", "user": {"value": 2118708, "label": "Florents-Tselai"}, "created_at": "2021-11-25T16:31:47Z", "updated_at": "2021-11-25T16:31:47Z", "author_association": "NONE", "body": "Thanks for your reply @simonw . Tbh, my first attempt was actually the `parquet-to-sqlite` package but I already had Makefiles that relied on `SQLite-utils` and it was less intrusive to my workflow. Maybe I'll revisit that decision.\r\nFYI: there's a `[sqlite-parquet-vtable](https://github.com/cldellow/sqlite-parquet-vtable)`\r\n\r\nI don't think plugins make much sense either. Probably defeats the purpose of simplicity: simple database along with a pip-able package.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1039037439, "label": "Add functionality to read Parquet files."}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/pull/333#issuecomment-979442854", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/333", "id": 979442854, "node_id": "IC_kwDOCGYnMM46YRym", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-11-25T19:47:26Z", "updated_at": "2021-11-25T19:47:26Z", "author_association": "OWNER", "body": "I just remembered that there's one other place that this could fit: as a Datasette \"insert\" plugin.\r\n\r\nThis is vaporware at the moment, but the idea is that Datasette itself could grow a mechanism for importing data, that's driven by plugins.\r\n\r\nOut of the box Datasette would be able to import CSV and CSV files, similar to `sqlite-utils insert ... --csv` - but plugins would then be able to add support for additional format such as GeoJSON or - in this case - Parquet.\r\n\r\nThe neat thing about having it as a Datasette plugin is that one plugin would enable three different ways of importing data:\r\n\r\n1. Via a new `datasette insert ...` CLI option (similar to `sqlite-utils`)\r\n2. Via a web form upload interface, where authenticated Datasette users would be able to upload files\r\n3. Via an API interface, where files could be programatically submitted to a running Datasette server\r\n\r\nI started fleshing out this idea quite a while ago but didn't make much concrete progress, maybe I should revisit it:\r\n\r\n- https://github.com/simonw/datasette/issues/1160", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1039037439, "label": "Add functionality to read Parquet files."}, "performed_via_github_app": null}