{"html_url": "https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1155364367", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/412", "id": 1155364367, "node_id": "IC_kwDOCGYnMM5E3XYP", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-06-14T15:36:28Z", "updated_at": "2022-06-14T15:36:28Z", "author_association": "OWNER", "body": "Here's as far as I got with my initial prototype, in `sqlite_utils/pandas.py`:\r\n\r\n```python\r\nfrom .db import Database as _Database, Table as _Table, View as _View\r\nimport pandas as pd\r\nfrom typing import (\r\n Iterable,\r\n Union,\r\n Optional,\r\n)\r\n\r\n\r\nclass Database(_Database):\r\n def query(\r\n self, sql: str, params: Optional[Union[Iterable, dict]] = None\r\n ) -> pd.DataFrame:\r\n return pd.DataFrame(super().query(sql, params))\r\n\r\n def table(self, table_name: str, **kwargs) -> Union[\"Table\", \"View\"]:\r\n \"Return a table object, optionally configured with default options.\"\r\n klass = View if table_name in self.view_names() else Table\r\n return klass(self, table_name, **kwargs)\r\n\r\n\r\nclass PandasQueryable:\r\n def rows_where(\r\n self,\r\n where: str = None,\r\n where_args: Optional[Union[Iterable, dict]] = None,\r\n order_by: str = None,\r\n select: str = \"*\",\r\n limit: int = None,\r\n offset: int = None,\r\n ) -> pd.DataFrame:\r\n return pd.DataFrame(\r\n super().rows_where(\r\n where,\r\n where_args,\r\n order_by=order_by,\r\n select=select,\r\n limit=limit,\r\n offset=offset,\r\n )\r\n )\r\n\r\n\r\nclass Table(PandasQueryable, _Table):\r\n pass\r\n\r\n\r\nclass View(PandasQueryable, _View):\r\n pass\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1160182768, "label": "Optional Pandas integration"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1059652834", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/412", "id": 1059652834, "node_id": "IC_kwDOCGYnMM4_KQTi", "user": {"value": 596279, "label": "zaneselvans"}, "created_at": "2022-03-05T02:14:40Z", "updated_at": "2022-03-05T02:14:40Z", "author_association": "NONE", "body": "We do a lot of `df.to_sql()` to write into sqlite, mostly in [this moddule](https://github.com/catalyst-cooperative/pudl/blob/main/src/pudl/load.py#L25)", "reactions": "{\"total_count\": 2, \"+1\": 2, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1160182768, "label": "Optional Pandas integration"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1059652538", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/412", "id": 1059652538, "node_id": "IC_kwDOCGYnMM4_KQO6", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T02:13:17Z", "updated_at": "2022-03-05T02:13:17Z", "author_association": "OWNER", "body": "> It looks like the existing `pd.read_sql_query()` method has an optional dependency on SQLAlchemy:\r\n> \r\n> ```\r\n> ...\r\n> import pandas as pd\r\n> pd.read_sql_query(db.conn, \"select * from articles\")\r\n> # ImportError: Using URI string without sqlalchemy installed.\r\n> ```\r\nHah, no I was wrong about this: SQLAlchemy is not needed for SQLite to work, I just had the arguments the wrong way round:\r\n```python\r\npd.read_sql_query(\"select * from articles\", db.conn)\r\n# Shows a DateFrame\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1160182768, "label": "Optional Pandas integration"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1059651306", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/412", "id": 1059651306, "node_id": "IC_kwDOCGYnMM4_KP7q", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T02:10:49Z", "updated_at": "2022-03-05T02:10:49Z", "author_association": "OWNER", "body": "I could teach `.insert_all()` and `.upsert_all()` to optionally accept a DataFrame. A challenge there is `mypy` - if Pandas is an optional dependency, is it possibly to declare types that accept a Union that includes DataFrame?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1160182768, "label": "Optional Pandas integration"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1059651056", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/412", "id": 1059651056, "node_id": "IC_kwDOCGYnMM4_KP3w", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T02:09:38Z", "updated_at": "2022-03-05T02:09:38Z", "author_association": "OWNER", "body": "OK, so reading results from existing `sqlite-utils` into a Pandas DataFrame turns out to be trivial.\r\n\r\nHow about writing a DataFrame to a database table?\r\n\r\nThat feels like it could a lot more useful.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1160182768, "label": "Optional Pandas integration"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1059650190", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/412", "id": 1059650190, "node_id": "IC_kwDOCGYnMM4_KPqO", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T02:04:43Z", "updated_at": "2022-03-05T02:04:54Z", "author_association": "OWNER", "body": "To be honest, I'm having second thoughts about this now mainly because the idiom for turning a generator of dicts into a DataFrame is SO simple:\r\n\r\n```python\r\ndf = pd.DataFrame(db.query(\"select * from articles\"))\r\n```\r\nGiven it's that simple, I'm questioning if there's any value to adding this to `sqlite-utils` at all. This likely becomes a documentation thing instead!", "reactions": "{\"total_count\": 2, \"+1\": 2, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1160182768, "label": "Optional Pandas integration"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1059649803", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/412", "id": 1059649803, "node_id": "IC_kwDOCGYnMM4_KPkL", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T02:02:41Z", "updated_at": "2022-03-05T02:02:41Z", "author_association": "OWNER", "body": "It looks like the existing `pd.read_sql_query()` method has an optional dependency on SQLAlchemy:\r\n\r\n```\r\n...\r\nimport pandas as pd\r\npd.read_sql_query(db.conn, \"select * from articles\")\r\n# ImportError: Using URI string without sqlalchemy installed.\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1160182768, "label": "Optional Pandas integration"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1059649213", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/412", "id": 1059649213, "node_id": "IC_kwDOCGYnMM4_KPa9", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T02:00:10Z", "updated_at": "2022-03-05T02:00:10Z", "author_association": "OWNER", "body": "Requested feedback on Twitter here :https://twitter.com/simonw/status/1499927075930578948", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1160182768, "label": "Optional Pandas integration"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1059649193", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/412", "id": 1059649193, "node_id": "IC_kwDOCGYnMM4_KPap", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T02:00:02Z", "updated_at": "2022-03-05T02:00:02Z", "author_association": "OWNER", "body": "Yeah, I imagine there are plenty of ways to do this with Pandas already - I'm opportunistically looking for a way to provide better integration with the rest of the Pandas situation from the work I've done in `sqlite-utils` already.\r\n\r\nMight be that this isn't worth doing at all.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1160182768, "label": "Optional Pandas integration"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1059647114", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/412", "id": 1059647114, "node_id": "IC_kwDOCGYnMM4_KO6K", "user": {"value": 25778, "label": "eyeseast"}, "created_at": "2022-03-05T01:54:24Z", "updated_at": "2022-03-05T01:54:24Z", "author_association": "CONTRIBUTOR", "body": "I haven't tried this, but it looks like Pandas has a method for this: https://pandas.pydata.org/docs/reference/api/pandas.read_sql_query.html\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1160182768, "label": "Optional Pandas integration"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1059646645", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/412", "id": 1059646645, "node_id": "IC_kwDOCGYnMM4_KOy1", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T01:53:10Z", "updated_at": "2022-03-05T01:53:10Z", "author_association": "OWNER", "body": "I'm not an experienced enough Pandas user to know if this design is right or not. I'm going to leave this open for a while and solicit some feedback.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1160182768, "label": "Optional Pandas integration"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1059646543", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/412", "id": 1059646543, "node_id": "IC_kwDOCGYnMM4_KOxP", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T01:52:47Z", "updated_at": "2022-03-05T01:52:47Z", "author_association": "OWNER", "body": "I built a prototype of that second option and it looks pretty good:\r\n\r\n\"image\"\r\n\r\nHere's the `pandas.py` prototype:\r\n\r\n```python\r\nfrom .db import Database as _Database, Table as _Table, View as _View\r\nimport pandas as pd\r\nfrom typing import (\r\n Iterable,\r\n Union,\r\n Optional,\r\n)\r\n\r\n\r\nclass Database(_Database):\r\n def query(\r\n self, sql: str, params: Optional[Union[Iterable, dict]] = None\r\n ) -> pd.DataFrame:\r\n return pd.DataFrame(super().query(sql, params))\r\n\r\n def table(self, table_name: str, **kwargs) -> Union[\"Table\", \"View\"]:\r\n \"Return a table object, optionally configured with default options.\"\r\n klass = View if table_name in self.view_names() else Table\r\n return klass(self, table_name, **kwargs)\r\n\r\n\r\nclass PandasQueryable:\r\n def rows_where(\r\n self,\r\n where: str = None,\r\n where_args: Optional[Union[Iterable, dict]] = None,\r\n order_by: str = None,\r\n select: str = \"*\",\r\n limit: int = None,\r\n offset: int = None,\r\n ) -> pd.DataFrame:\r\n return pd.DataFrame(\r\n super().rows_where(\r\n where,\r\n where_args,\r\n order_by=order_by,\r\n select=select,\r\n limit=limit,\r\n offset=offset,\r\n )\r\n )\r\n\r\n\r\nclass Table(PandasQueryable, _Table):\r\n pass\r\n\r\n\r\nclass View(PandasQueryable, _View):\r\n pass\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1160182768, "label": "Optional Pandas integration"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1059646247", "issue_url": "https://api.github.com/repos/simonw/sqlite-utils/issues/412", "id": 1059646247, "node_id": "IC_kwDOCGYnMM4_KOsn", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T01:51:03Z", "updated_at": "2022-03-05T01:51:03Z", "author_association": "OWNER", "body": "I considered two ways of doing this.\r\n\r\nFirst, have methods such as `db.query_df()` and `table.rows_df` which do the same as `.query()` and `table.rows` but return a DataFrame instead of a generator of dictionaries.\r\n\r\nSecond, have a compatibility class that is imported separately such as:\r\n```python\r\nfrom sqlite_utils.pandas import Database\r\n```\r\nThen have the `.query()` and `.rows` and other similar methods return dataframes.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 1160182768, "label": "Optional Pandas integration"}, "performed_via_github_app": null}