html_url,issue_url,id,node_id,user,user_label,created_at,updated_at,author_association,body,reactions,issue,issue_label,performed_via_github_app https://github.com/simonw/datasette/issues/1576#issuecomment-1030530071,https://api.github.com/repos/simonw/datasette/issues/1576,1030530071,IC_kwDOBm6k_c49bKQX,9599,simonw,2022-02-05T05:21:35Z,2022-02-05T05:21:35Z,OWNER,New documentation section: https://docs.datasette.io/en/latest/internals.html#datasette-tracer,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1087181951,Traces should include SQL executed by subtasks created with `asyncio.gather`, https://github.com/simonw/datasette/issues/1576#issuecomment-1030528532,https://api.github.com/repos/simonw/datasette/issues/1576,1030528532,IC_kwDOBm6k_c49bJ4U,9599,simonw,2022-02-05T05:09:57Z,2022-02-05T05:09:57Z,OWNER,Needs documentation. I'll document `from datasette.tracer import trace` too.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1087181951,Traces should include SQL executed by subtasks created with `asyncio.gather`, https://github.com/simonw/datasette/issues/1576#issuecomment-1030525218,https://api.github.com/repos/simonw/datasette/issues/1576,1030525218,IC_kwDOBm6k_c49bJEi,9599,simonw,2022-02-05T04:45:11Z,2022-02-05T04:45:11Z,OWNER,"Got a prototype working with `contextvars` - it identified two parallel executing queries using the patch from above: ![CleanShot 2022-02-04 at 20 41 50@2x](https://user-images.githubusercontent.com/9599/152628949-cf766b13-13cf-4831-b48d-2f23cadb6a05.png) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1087181951,Traces should include SQL executed by subtasks created with `asyncio.gather`, https://github.com/simonw/datasette/issues/1576#issuecomment-1017112543,https://api.github.com/repos/simonw/datasette/issues/1576,1017112543,IC_kwDOBm6k_c48n-ff,9599,simonw,2022-01-20T04:35:00Z,2022-02-05T04:33:46Z,OWNER,I dropped support for Python 3.6 in fae3983c51f4a3aca8335f3e01ff85ef27076fbf so now free to use `contextvars` for this.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1087181951,Traces should include SQL executed by subtasks created with `asyncio.gather`, https://github.com/simonw/datasette/issues/1576#issuecomment-1027635925,https://api.github.com/repos/simonw/datasette/issues/1576,1027635925,IC_kwDOBm6k_c49QHrV,9599,simonw,2022-02-02T06:47:20Z,2022-02-02T06:47:20Z,OWNER,"Here's what I was hacking around with when I uncovered this problem: ```diff diff --git a/datasette/views/table.py b/datasette/views/table.py index 77fb285..8c57d08 100644 --- a/datasette/views/table.py +++ b/datasette/views/table.py @@ -1,3 +1,4 @@ +import asyncio import urllib import itertools import json @@ -615,44 +616,37 @@ class TableView(RowTableShared): if request.args.get(""_timelimit""): extra_args[""custom_time_limit""] = int(request.args.get(""_timelimit"")) - # Execute the main query! - results = await db.execute(sql, params, truncate=True, **extra_args) - - # Calculate the total count for this query - filtered_table_rows_count = None - if ( - not db.is_mutable - and self.ds.inspect_data - and count_sql == f""select count(*) from {table} "" - ): - # We can use a previously cached table row count - try: - filtered_table_rows_count = self.ds.inspect_data[database][""tables""][ - table - ][""count""] - except KeyError: - pass - - # Otherwise run a select count(*) ... - if count_sql and filtered_table_rows_count is None and not nocount: - try: - count_rows = list(await db.execute(count_sql, from_sql_params)) - filtered_table_rows_count = count_rows[0][0] - except QueryInterrupted: - pass - - # Faceting - if not self.ds.setting(""allow_facet"") and any( - arg.startswith(""_facet"") for arg in request.args - ): - raise BadRequest(""_facet= is not allowed"") + async def execute_count(): + # Calculate the total count for this query + filtered_table_rows_count = None + if ( + not db.is_mutable + and self.ds.inspect_data + and count_sql == f""select count(*) from {table} "" + ): + # We can use a previously cached table row count + try: + filtered_table_rows_count = self.ds.inspect_data[database][ + ""tables"" + ][table][""count""] + except KeyError: + pass + + if count_sql and filtered_table_rows_count is None and not nocount: + try: + count_rows = list(await db.execute(count_sql, from_sql_params)) + filtered_table_rows_count = count_rows[0][0] + except QueryInterrupted: + pass + + return filtered_table_rows_count + + filtered_table_rows_count = await execute_count() # pylint: disable=no-member facet_classes = list( itertools.chain.from_iterable(pm.hook.register_facet_classes()) ) - facet_results = {} - facets_timed_out = [] facet_instances = [] for klass in facet_classes: facet_instances.append( @@ -668,33 +662,58 @@ class TableView(RowTableShared): ) ) - if not nofacet: - for facet in facet_instances: - ( - instance_facet_results, - instance_facets_timed_out, - ) = await facet.facet_results() - for facet_info in instance_facet_results: - base_key = facet_info[""name""] - key = base_key - i = 1 - while key in facet_results: - i += 1 - key = f""{base_key}_{i}"" - facet_results[key] = facet_info - facets_timed_out.extend(instance_facets_timed_out) - - # Calculate suggested facets - suggested_facets = [] - if ( - self.ds.setting(""suggest_facets"") - and self.ds.setting(""allow_facet"") - and not _next - and not nofacet - and not nosuggest - ): - for facet in facet_instances: - suggested_facets.extend(await facet.suggest()) + async def execute_suggested_facets(): + # Calculate suggested facets + suggested_facets = [] + if ( + self.ds.setting(""suggest_facets"") + and self.ds.setting(""allow_facet"") + and not _next + and not nofacet + and not nosuggest + ): + for facet in facet_instances: + suggested_facets.extend(await facet.suggest()) + return suggested_facets + + async def execute_facets(): + facet_results = {} + facets_timed_out = [] + if not self.ds.setting(""allow_facet"") and any( + arg.startswith(""_facet"") for arg in request.args + ): + raise BadRequest(""_facet= is not allowed"") + + if not nofacet: + for facet in facet_instances: + ( + instance_facet_results, + instance_facets_timed_out, + ) = await facet.facet_results() + for facet_info in instance_facet_results: + base_key = facet_info[""name""] + key = base_key + i = 1 + while key in facet_results: + i += 1 + key = f""{base_key}_{i}"" + facet_results[key] = facet_info + facets_timed_out.extend(instance_facets_timed_out) + + return facet_results, facets_timed_out + + # Execute the main query, facets and facet suggestions in parallel: + ( + results, + suggested_facets, + (facet_results, facets_timed_out), + ) = await asyncio.gather( + db.execute(sql, params, truncate=True, **extra_args), + execute_suggested_facets(), + execute_facets(), + ) + + results = await db.execute(sql, params, truncate=True, **extra_args) # Figure out columns and rows for the query columns = [r[0] for r in results.description] ``` It's a hacky attempt at running some of the table page queries in parallel to see what happens.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1087181951,Traces should include SQL executed by subtasks created with `asyncio.gather`, https://github.com/simonw/datasette/issues/1576#issuecomment-1000935523,https://api.github.com/repos/simonw/datasette/issues/1576,1000935523,IC_kwDOBm6k_c47qRBj,9599,simonw,2021-12-24T21:33:05Z,2021-12-24T21:33:05Z,OWNER,"Another option would be to attempt to import `contextvars` and, if the import fails (for Python 3.6) continue using the current mechanism - then let Python 3.6 users know in the documentation that under Python 3.6 they will miss out on nested traces.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1087181951,Traces should include SQL executed by subtasks created with `asyncio.gather`, https://github.com/simonw/datasette/issues/1576#issuecomment-999990414,https://api.github.com/repos/simonw/datasette/issues/1576,999990414,IC_kwDOBm6k_c47mqSO,9599,simonw,2021-12-23T02:08:39Z,2021-12-23T18:16:35Z,OWNER,"It's tiny: I'm tempted to vendor it. https://github.com/Skyscanner/aiotask-context/blob/master/aiotask_context/__init__.py No, I'll add it as a pinned dependency, which I can then drop when I drop 3.6 support.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1087181951,Traces should include SQL executed by subtasks created with `asyncio.gather`, https://github.com/simonw/datasette/issues/1576#issuecomment-999987418,https://api.github.com/repos/simonw/datasette/issues/1576,999987418,IC_kwDOBm6k_c47mpja,9599,simonw,2021-12-23T01:59:58Z,2021-12-23T02:02:12Z,OWNER,"Another option: https://github.com/Skyscanner/aiotask-context - looks like it might be better as it's been updated for Python 3.7 in this commit https://github.com/Skyscanner/aiotask-context/commit/67108c91d2abb445655cc2af446fdb52ca7890c4 The Skyscanner one doesn't attempt to wrap any existing factories, but that's OK for my purposes since I don't need to handle arbitrary `asyncio` code written by other people.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1087181951,Traces should include SQL executed by subtasks created with `asyncio.gather`, https://github.com/simonw/datasette/issues/1576#issuecomment-999876666,https://api.github.com/repos/simonw/datasette/issues/1576,999876666,IC_kwDOBm6k_c47mOg6,9599,simonw,2021-12-22T20:59:22Z,2021-12-22T21:18:09Z,OWNER,"This article is relevant: [Context information storage for asyncio](https://blog.sqreen.com/asyncio/) - in particular the section https://blog.sqreen.com/asyncio/#context-inheritance-between-tasks which describes exactly the problem I have and their solution, which involves this trickery: ```python def request_task_factory(loop, coro): child_task = asyncio.tasks.Task(coro, loop=loop) parent_task = asyncio.Task.current_task(loop=loop) current_request = getattr(parent_task, 'current_request', None) setattr(child_task, 'current_request', current_request) return child_task loop = asyncio.get_event_loop() loop.set_task_factory(request_task_factory) ``` They released their solution as a library: https://pypi.org/project/aiocontext/ and https://github.com/sqreen/AioContext - but that company was acquired by Datadog back in April and doesn't seem to be actively maintaining their open source stuff any more: https://twitter.com/SqreenIO/status/1384906075506364417","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1087181951,Traces should include SQL executed by subtasks created with `asyncio.gather`, https://github.com/simonw/datasette/issues/1576#issuecomment-999878907,https://api.github.com/repos/simonw/datasette/issues/1576,999878907,IC_kwDOBm6k_c47mPD7,9599,simonw,2021-12-22T21:03:49Z,2021-12-22T21:10:46Z,OWNER,"`context_vars` can solve this but they were introduced in Python 3.7: https://www.python.org/dev/peps/pep-0567/ Python 3.6 support ends in a few days time, and it looks like Glitch has updated to 3.7 now - so maybe I can get away with Datasette needing 3.7 these days? Tweeted about that here: https://twitter.com/simonw/status/1473761478155010048","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1087181951,Traces should include SQL executed by subtasks created with `asyncio.gather`, https://github.com/simonw/datasette/issues/1576#issuecomment-999874886,https://api.github.com/repos/simonw/datasette/issues/1576,999874886,IC_kwDOBm6k_c47mOFG,9599,simonw,2021-12-22T20:55:42Z,2021-12-22T20:57:28Z,OWNER,"One way to solve this would be to introduce a `set_task_id()` method, which sets an ID which will be returned by `get_task_id()` instead of using `id(current_task(loop=loop))`. It would be really nice if I could solve this using `with` syntax somehow. Something like: ```python with trace_child_tasks(): ( suggested_facets, (facet_results, facets_timed_out), ) = await asyncio.gather( execute_suggested_facets(), execute_facets(), ) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1087181951,Traces should include SQL executed by subtasks created with `asyncio.gather`, https://github.com/simonw/datasette/issues/1576#issuecomment-999874484,https://api.github.com/repos/simonw/datasette/issues/1576,999874484,IC_kwDOBm6k_c47mN-0,9599,simonw,2021-12-22T20:54:52Z,2021-12-22T20:54:52Z,OWNER,"Here's the full current relevant code from `tracer.py`: https://github.com/simonw/datasette/blob/ace86566b28280091b3844cf5fbecd20158e9004/datasette/tracer.py#L8-L64 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1087181951,Traces should include SQL executed by subtasks created with `asyncio.gather`,