github

This data as json, CSV

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	issue	performed_via_github_app
https://github.com/simonw/datasette/issues/1293#issuecomment-813112546	https://api.github.com/repos/simonw/datasette/issues/1293	813112546	MDEyOklzc3VlQ29tbWVudDgxMzExMjU0Ng==	9599	2021-04-04T23:02:45Z	2021-04-04T23:02:45Z	OWNER	I've done various pieces of research into this over the past few years. Capturing what I've discovered in this ticket. The SQLite C API has functions that can help with this: https://www.sqlite.org/c3ref/column_database_name.html details those. But they're not exposed in the Python SQLite library. Maybe it would be possible to use them via `ctypes`? My hunch is that I would have to re-implement the full `sqlite3` module with `ctypes`, which sounds daunting.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	849978964
https://github.com/simonw/datasette/issues/1293#issuecomment-813113175	https://api.github.com/repos/simonw/datasette/issues/1293	813113175	MDEyOklzc3VlQ29tbWVudDgxMzExMzE3NQ==	9599	2021-04-04T23:07:01Z	2021-04-04T23:07:01Z	OWNER	A more promising route I found involved the `db.set_authorizer` method. This can be used to log the permission checks that SQLite uses, including checks for permission to access specific columns of specific tables. For a while I thought this could work! ```pycon >>> def print_args(args, kwargs): ... print("args", args, "kwargs", kwargs) ... return sqlite3.SQLITE_OK >>> db = sqlite3.connect("fixtures.db") >>> db.execute('select from compound_primary_key join facetable on rowid').fetchall() args (21, None, None, None, None) kwargs {} args (20, 'compound_primary_key', 'pk1', 'main', None) kwargs {} args (20, 'compound_primary_key', 'pk2', 'main', None) kwargs {} args (20, 'compound_primary_key', 'content', 'main', None) kwargs {} args (20, 'facetable', 'pk', 'main', None) kwargs {} args (20, 'facetable', 'created', 'main', None) kwargs {} args (20, 'facetable', 'planet_int', 'main', None) kwargs {} args (20, 'facetable', 'on_earth', 'main', None) kwargs {} args (20, 'facetable', 'state', 'main', None) kwargs {} args (20, 'facetable', 'city_id', 'main', None) kwargs {} args (20, 'facetable', 'neighborhood', 'main', None) kwargs {} args (20, 'facetable', 'tags', 'main', None) kwargs {} args (20, 'facetable', 'complex_array', 'main', None) kwargs {} args (20, 'facetable', 'distinct_some_null', 'main', None) kwargs {} ``` Those `20` values (where 20 is `SQLITE_READ`) looked like they were checking permissions for the columns in the order they would be returned! Then I found a snag: ```pycon In [18]: db.execute('select 1 + 1 + (select max(rowid) from facetable)') args (21, None, None, None, None) kwargs {} args (31, None, 'max', None, None) kwargs {} args (20, 'facetable', 'pk', 'main', None) kwargs {} args (21, None, None, None, None) kwargs {} args (20, 'facetable', '', None, None) kwargs {} ``` Once a subselect is involved the order of the `20` checks no longer matches the order in which the columns are returned from the query.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	849978964
https://github.com/simonw/datasette/issues/1293#issuecomment-813113218	https://api.github.com/repos/simonw/datasette/issues/1293	813113218	MDEyOklzc3VlQ29tbWVudDgxMzExMzIxOA==	9599	2021-04-04T23:07:25Z	2021-04-04T23:07:25Z	OWNER	Here are all of the available constants: ```pycon In [3]: for k in dir(sqlite3): ...: if k.startswith("SQLITE_"): ...: print(k, getattr(sqlite3, k)) ...: SQLITE_ALTER_TABLE 26 SQLITE_ANALYZE 28 SQLITE_ATTACH 24 SQLITE_CREATE_INDEX 1 SQLITE_CREATE_TABLE 2 SQLITE_CREATE_TEMP_INDEX 3 SQLITE_CREATE_TEMP_TABLE 4 SQLITE_CREATE_TEMP_TRIGGER 5 SQLITE_CREATE_TEMP_VIEW 6 SQLITE_CREATE_TRIGGER 7 SQLITE_CREATE_VIEW 8 SQLITE_CREATE_VTABLE 29 SQLITE_DELETE 9 SQLITE_DENY 1 SQLITE_DETACH 25 SQLITE_DONE 101 SQLITE_DROP_INDEX 10 SQLITE_DROP_TABLE 11 SQLITE_DROP_TEMP_INDEX 12 SQLITE_DROP_TEMP_TABLE 13 SQLITE_DROP_TEMP_TRIGGER 14 SQLITE_DROP_TEMP_VIEW 15 SQLITE_DROP_TRIGGER 16 SQLITE_DROP_VIEW 17 SQLITE_DROP_VTABLE 30 SQLITE_FUNCTION 31 SQLITE_IGNORE 2 SQLITE_INSERT 18 SQLITE_OK 0 SQLITE_PRAGMA 19 SQLITE_READ 20 SQLITE_RECURSIVE 33 SQLITE_REINDEX 27 SQLITE_SAVEPOINT 32 SQLITE_SELECT 21 SQLITE_TRANSACTION 22 SQLITE_UPDATE 23 ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	849978964
https://github.com/simonw/datasette/issues/1293#issuecomment-813113403	https://api.github.com/repos/simonw/datasette/issues/1293	813113403	MDEyOklzc3VlQ29tbWVudDgxMzExMzQwMw==	9599	2021-04-04T23:08:48Z	2021-04-04T23:08:48Z	OWNER	Worth noting that adding `limit 0` to the query still causes it to conduct the permission checks, hopefully while avoiding doing any of the actual work of executing the query: ```pycon In [20]: db.execute('select * from compound_primary_key join facetable on facetable.rowid = compound_primary_key.rowid limit 0').fetchall() ...: args (21, None, None, None, None) kwargs {} args (20, 'compound_primary_key', 'pk1', 'main', None) kwargs {} args (20, 'compound_primary_key', 'pk2', 'main', None) kwargs {} args (20, 'compound_primary_key', 'content', 'main', None) kwargs {} args (20, 'facetable', 'pk', 'main', None) kwargs {} args (20, 'facetable', 'created', 'main', None) kwargs {} args (20, 'facetable', 'planet_int', 'main', None) kwargs {} args (20, 'facetable', 'on_earth', 'main', None) kwargs {} args (20, 'facetable', 'state', 'main', None) kwargs {} args (20, 'facetable', 'city_id', 'main', None) kwargs {} args (20, 'facetable', 'neighborhood', 'main', None) kwargs {} args (20, 'facetable', 'tags', 'main', None) kwargs {} args (20, 'facetable', 'complex_array', 'main', None) kwargs {} args (20, 'facetable', 'distinct_some_null', 'main', None) kwargs {} args (20, 'facetable', 'pk', 'main', None) kwargs {} args (20, 'compound_primary_key', 'ROWID', 'main', None) kwargs {} Out[20]: [] ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	849978964
https://github.com/simonw/datasette/issues/1293#issuecomment-813113653	https://api.github.com/repos/simonw/datasette/issues/1293	813113653	MDEyOklzc3VlQ29tbWVudDgxMzExMzY1Mw==	9599	2021-04-04T23:10:49Z	2021-04-04T23:10:49Z	OWNER	One option I've not fully explored yet: could I write my own custom SQLite C extension which exposes this functionality as a callable function? Then I could load that extension and run a SQL query something like this: ``` select database, table, column from analyze_query(:sql_query) ``` Where `analyze_query(...)` would be a fancy virtual table function of some sort that uses the underlying `sqlite3_column_database_name()` C functions to analyze the SQL query and return details of what it would return.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	849978964
https://github.com/simonw/datasette/issues/1293#issuecomment-813114933	https://api.github.com/repos/simonw/datasette/issues/1293	813114933	MDEyOklzc3VlQ29tbWVudDgxMzExNDkzMw==	9599	2021-04-04T23:19:22Z	2021-04-04T23:19:22Z	OWNER	I asked about this on the SQLite forum: https://sqlite.org/forum/forumpost/0180277fb7	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	849978964
https://github.com/simonw/datasette/issues/1293#issuecomment-813115414	https://api.github.com/repos/simonw/datasette/issues/1293	813115414	MDEyOklzc3VlQ29tbWVudDgxMzExNTQxNA==	9599	2021-04-04T23:23:34Z	2021-04-04T23:23:34Z	OWNER	The other approach I considered for this was to have my own SQL query parser running in Python, which could pick apart a complex query and figure out which column was sourced from which table. I dropped this idea because it felt that the moment `select ` came into play a pure parsing approach wouldn't work - I'd need knowledge of the schema in order to resolve the ``. A Python parser approach might be good enough to handle a subset of queries - those that don't use `select *` for example - and maybe that would be worth shipping? The feature doesn't have to be perfect for it to be useful.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	849978964
https://github.com/simonw/datasette/issues/1293#issuecomment-813115607	https://api.github.com/repos/simonw/datasette/issues/1293	813115607	MDEyOklzc3VlQ29tbWVudDgxMzExNTYwNw==	9599	2021-04-04T23:25:15Z	2021-04-04T23:25:15Z	OWNER	Oh wow, I just spotted https://github.com/macbre/sql-metadata > Uses tokenized query returned by python-sqlparse and generates query metadata. Extracts column names and tables used by the query. Provides a helper for normalization of SQL queries and tables aliases resolving. It's for MySQL, PostgreSQL and Hive right now but maybe getting it working with SQLite wouldn't be too hard?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	849978964
https://github.com/simonw/datasette/issues/1293#issuecomment-813116177	https://api.github.com/repos/simonw/datasette/issues/1293	813116177	MDEyOklzc3VlQ29tbWVudDgxMzExNjE3Nw==	9599	2021-04-04T23:31:00Z	2021-04-04T23:31:00Z	OWNER	Sadly it doesn't do what I need. This query should only return one column, but instead I get back every column that was consulted by the query: <img width="597" alt="sql-metadata_-_Jupyter_Notebook" src="https://user-images.githubusercontent.com/9599/113524385-216ca000-9563-11eb-8a7e-cbbe5dfc02f6.png">	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	849978964