github
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/simonw/datasette/issues/1293#issuecomment-813112546 | https://api.github.com/repos/simonw/datasette/issues/1293 | 813112546 | MDEyOklzc3VlQ29tbWVudDgxMzExMjU0Ng== | 9599 | 2021-04-04T23:02:45Z | 2021-04-04T23:02:45Z | OWNER | I've done various pieces of research into this over the past few years. Capturing what I've discovered in this ticket. The SQLite C API has functions that can help with this: https://www.sqlite.org/c3ref/column_database_name.html details those. But they're not exposed in the Python SQLite library. Maybe it would be possible to use them via `ctypes`? My hunch is that I would have to re-implement the full `sqlite3` module with `ctypes`, which sounds daunting. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
849978964 | |
https://github.com/simonw/datasette/issues/1293#issuecomment-813113175 | https://api.github.com/repos/simonw/datasette/issues/1293 | 813113175 | MDEyOklzc3VlQ29tbWVudDgxMzExMzE3NQ== | 9599 | 2021-04-04T23:07:01Z | 2021-04-04T23:07:01Z | OWNER | A more promising route I found involved the `db.set_authorizer` method. This can be used to log the permission checks that SQLite uses, including checks for permission to access specific columns of specific tables. For a while I thought this could work! ```pycon >>> def print_args(*args, **kwargs): ... print("args", args, "kwargs", kwargs) ... return sqlite3.SQLITE_OK >>> db = sqlite3.connect("fixtures.db") >>> db.execute('select * from compound_primary_key join facetable on rowid').fetchall() args (21, None, None, None, None) kwargs {} args (20, 'compound_primary_key', 'pk1', 'main', None) kwargs {} args (20, 'compound_primary_key', 'pk2', 'main', None) kwargs {} args (20, 'compound_primary_key', 'content', 'main', None) kwargs {} args (20, 'facetable', 'pk', 'main', None) kwargs {} args (20, 'facetable', 'created', 'main', None) kwargs {} args (20, 'facetable', 'planet_int', 'main', None) kwargs {} args (20, 'facetable', 'on_earth', 'main', None) kwargs {} args (20, 'facetable', 'state', 'main', None) kwargs {} args (20, 'facetable', 'city_id', 'main', None) kwargs {} args (20, 'facetable', 'neighborhood', 'main', None) kwargs {} args (20, 'facetable', 'tags', 'main', None) kwargs {} args (20, 'facetable', 'complex_array', 'main', None) kwargs {} args (20, 'facetable', 'distinct_some_null', 'main', None) kwargs {} ``` Those `20` values (where 20 is `SQLITE_READ`) looked like they were checking permissions for the columns in the order they would be returned! Then I found a snag: ```pycon In [18]: db.execute('select 1 + 1 + (select max(rowid) from facetable)') args (21, None, None, None, None) kwargs {} args (31, None, 'max', None, None) kwargs {} args (20, 'facetable', 'pk', 'main', None) kwargs {} args (21, None, None, None, None) kwargs {} args (20, 'facetable', '', None, None) kwargs {} ``` Once a subselect is involved the order of the `20` checks no longer matches the order in which the columns are returned from the query. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
849978964 | |
https://github.com/simonw/datasette/issues/1293#issuecomment-813113218 | https://api.github.com/repos/simonw/datasette/issues/1293 | 813113218 | MDEyOklzc3VlQ29tbWVudDgxMzExMzIxOA== | 9599 | 2021-04-04T23:07:25Z | 2021-04-04T23:07:25Z | OWNER | Here are all of the available constants: ```pycon In [3]: for k in dir(sqlite3): ...: if k.startswith("SQLITE_"): ...: print(k, getattr(sqlite3, k)) ...: SQLITE_ALTER_TABLE 26 SQLITE_ANALYZE 28 SQLITE_ATTACH 24 SQLITE_CREATE_INDEX 1 SQLITE_CREATE_TABLE 2 SQLITE_CREATE_TEMP_INDEX 3 SQLITE_CREATE_TEMP_TABLE 4 SQLITE_CREATE_TEMP_TRIGGER 5 SQLITE_CREATE_TEMP_VIEW 6 SQLITE_CREATE_TRIGGER 7 SQLITE_CREATE_VIEW 8 SQLITE_CREATE_VTABLE 29 SQLITE_DELETE 9 SQLITE_DENY 1 SQLITE_DETACH 25 SQLITE_DONE 101 SQLITE_DROP_INDEX 10 SQLITE_DROP_TABLE 11 SQLITE_DROP_TEMP_INDEX 12 SQLITE_DROP_TEMP_TABLE 13 SQLITE_DROP_TEMP_TRIGGER 14 SQLITE_DROP_TEMP_VIEW 15 SQLITE_DROP_TRIGGER 16 SQLITE_DROP_VIEW 17 SQLITE_DROP_VTABLE 30 SQLITE_FUNCTION 31 SQLITE_IGNORE 2 SQLITE_INSERT 18 SQLITE_OK 0 SQLITE_PRAGMA 19 SQLITE_READ 20 SQLITE_RECURSIVE 33 SQLITE_REINDEX 27 SQLITE_SAVEPOINT 32 SQLITE_SELECT 21 SQLITE_TRANSACTION 22 SQLITE_UPDATE 23 ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
849978964 | |
https://github.com/simonw/datasette/issues/1293#issuecomment-813113403 | https://api.github.com/repos/simonw/datasette/issues/1293 | 813113403 | MDEyOklzc3VlQ29tbWVudDgxMzExMzQwMw== | 9599 | 2021-04-04T23:08:48Z | 2021-04-04T23:08:48Z | OWNER | Worth noting that adding `limit 0` to the query still causes it to conduct the permission checks, hopefully while avoiding doing any of the actual work of executing the query: ```pycon In [20]: db.execute('select * from compound_primary_key join facetable on facetable.rowid = compound_primary_key.rowid limit 0').fetchall() ...: args (21, None, None, None, None) kwargs {} args (20, 'compound_primary_key', 'pk1', 'main', None) kwargs {} args (20, 'compound_primary_key', 'pk2', 'main', None) kwargs {} args (20, 'compound_primary_key', 'content', 'main', None) kwargs {} args (20, 'facetable', 'pk', 'main', None) kwargs {} args (20, 'facetable', 'created', 'main', None) kwargs {} args (20, 'facetable', 'planet_int', 'main', None) kwargs {} args (20, 'facetable', 'on_earth', 'main', None) kwargs {} args (20, 'facetable', 'state', 'main', None) kwargs {} args (20, 'facetable', 'city_id', 'main', None) kwargs {} args (20, 'facetable', 'neighborhood', 'main', None) kwargs {} args (20, 'facetable', 'tags', 'main', None) kwargs {} args (20, 'facetable', 'complex_array', 'main', None) kwargs {} args (20, 'facetable', 'distinct_some_null', 'main', None) kwargs {} args (20, 'facetable', 'pk', 'main', None) kwargs {} args (20, 'compound_primary_key', 'ROWID', 'main', None) kwargs {} Out[20]: [] ``` | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
849978964 | |
https://github.com/simonw/datasette/issues/1293#issuecomment-813113653 | https://api.github.com/repos/simonw/datasette/issues/1293 | 813113653 | MDEyOklzc3VlQ29tbWVudDgxMzExMzY1Mw== | 9599 | 2021-04-04T23:10:49Z | 2021-04-04T23:10:49Z | OWNER | One option I've not fully explored yet: could I write my own custom SQLite C extension which exposes this functionality as a callable function? Then I could load that extension and run a SQL query something like this: ``` select database, table, column from analyze_query(:sql_query) ``` Where `analyze_query(...)` would be a fancy virtual table function of some sort that uses the underlying `sqlite3_column_database_name()` C functions to analyze the SQL query and return details of what it would return. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
849978964 | |
https://github.com/simonw/datasette/issues/1293#issuecomment-813114933 | https://api.github.com/repos/simonw/datasette/issues/1293 | 813114933 | MDEyOklzc3VlQ29tbWVudDgxMzExNDkzMw== | 9599 | 2021-04-04T23:19:22Z | 2021-04-04T23:19:22Z | OWNER | I asked about this on the SQLite forum: https://sqlite.org/forum/forumpost/0180277fb7 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
849978964 | |
https://github.com/simonw/datasette/issues/1293#issuecomment-813115414 | https://api.github.com/repos/simonw/datasette/issues/1293 | 813115414 | MDEyOklzc3VlQ29tbWVudDgxMzExNTQxNA== | 9599 | 2021-04-04T23:23:34Z | 2021-04-04T23:23:34Z | OWNER | The other approach I considered for this was to have my own SQL query parser running in Python, which could pick apart a complex query and figure out which column was sourced from which table. I dropped this idea because it felt that the moment `select *` came into play a pure parsing approach wouldn't work - I'd need knowledge of the schema in order to resolve the `*`. A Python parser approach might be good enough to handle a subset of queries - those that don't use `select *` for example - and maybe that would be worth shipping? The feature doesn't have to be perfect for it to be useful. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
849978964 | |
https://github.com/simonw/datasette/issues/1293#issuecomment-813115607 | https://api.github.com/repos/simonw/datasette/issues/1293 | 813115607 | MDEyOklzc3VlQ29tbWVudDgxMzExNTYwNw== | 9599 | 2021-04-04T23:25:15Z | 2021-04-04T23:25:15Z | OWNER | Oh wow, I just spotted https://github.com/macbre/sql-metadata > Uses tokenized query returned by python-sqlparse and generates query metadata. Extracts column names and tables used by the query. Provides a helper for normalization of SQL queries and tables aliases resolving. It's for MySQL, PostgreSQL and Hive right now but maybe getting it working with SQLite wouldn't be too hard? | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
849978964 | |
https://github.com/simonw/datasette/issues/1293#issuecomment-813116177 | https://api.github.com/repos/simonw/datasette/issues/1293 | 813116177 | MDEyOklzc3VlQ29tbWVudDgxMzExNjE3Nw== | 9599 | 2021-04-04T23:31:00Z | 2021-04-04T23:31:00Z | OWNER | Sadly it doesn't do what I need. This query should only return one column, but instead I get back every column that was consulted by the query: <img width="597" alt="sql-metadata_-_Jupyter_Notebook" src="https://user-images.githubusercontent.com/9599/113524385-216ca000-9563-11eb-8a7e-cbbe5dfc02f6.png"> | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
849978964 |