html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app
https://github.com/simonw/datasette/issues/1744#issuecomment-1129251699,https://api.github.com/repos/simonw/datasette/issues/1744,1129251699,IC_kwDOBm6k_c5DTwNz,9599,2022-05-17T19:44:47Z,2022-05-17T19:46:38Z,OWNER,Updated docs: https://docs.datasette.io/en/latest/getting_started.html#using-datasette-on-your-own-computer and https://docs.datasette.io/en/latest/cli-reference.html#datasette-serve-help,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1239008850,
https://github.com/simonw/datasette/issues/1745#issuecomment-1129252603,https://api.github.com/repos/simonw/datasette/issues/1745,1129252603,IC_kwDOBm6k_c5DTwb7,9599,2022-05-17T19:45:51Z,2022-05-17T19:45:51Z,OWNER,Now documented here: https://docs.datasette.io/en/latest/contributing.html#running-cog,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1239080102,
https://github.com/simonw/datasette/issues/1744#issuecomment-1129243427,https://api.github.com/repos/simonw/datasette/issues/1744,1129243427,IC_kwDOBm6k_c5DTuMj,9599,2022-05-17T19:35:02Z,2022-05-17T19:35:02Z,OWNER,"One thing to note is that the `datasette-copy-to-memory` plugin broke with a locked file, because it does this: https://github.com/simonw/datasette-copy-to-memory/blob/d541c18a78ae6f707a8f9b1e7fc4c020a9f68f2e/datasette_copy_to_memory/__init__.py#L27
```python
tmp.execute(""ATTACH DATABASE ? AS _copy_from"", [db.path])
```
That would need to use a URI filename too for it to work with locked files.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1239008850,
https://github.com/simonw/datasette/issues/1744#issuecomment-1129241873,https://api.github.com/repos/simonw/datasette/issues/1744,1129241873,IC_kwDOBm6k_c5DTt0R,9599,2022-05-17T19:33:16Z,2022-05-17T19:33:16Z,OWNER,"I'm going to skip adding a test for this - the test logic would have to be pretty convoluted to exercise it properly, and it's a pretty minor and low-risk feature in the scheme of things.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1239008850,
https://github.com/simonw/datasette/issues/1744#issuecomment-1129241283,https://api.github.com/repos/simonw/datasette/issues/1744,1129241283,IC_kwDOBm6k_c5DTtrD,9599,2022-05-17T19:32:35Z,2022-05-17T19:32:35Z,OWNER,"I tried writing a test like this:
```python
@pytest.mark.parametrize(""locked"", (True, False))
def test_locked_sqlite_db(tmp_path_factory, locked):
dir = tmp_path_factory.mktemp(""test_locked_sqlite_db"")
test_db = str(dir / ""test.db"")
sqlite3.connect(test_db).execute(""create table t (id integer primary key)"")
if locked:
fp = open(test_db, ""w"")
fcntl.lockf(fp.fileno(), fcntl.LOCK_EX)
runner = CliRunner()
result = runner.invoke(
cli,
[
""serve"",
""--memory"",
""--get"",
""/test"",
],
catch_exceptions=False,
)
```
But it didn't work, because the test runs in the same process - so taking an exclusive lock on that file didn't cause an error when the test later tried to access it via Datasette!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1239008850,
https://github.com/simonw/datasette/issues/1744#issuecomment-1129187486,https://api.github.com/repos/simonw/datasette/issues/1744,1129187486,IC_kwDOBm6k_c5DTgie,9599,2022-05-17T18:28:49Z,2022-05-17T18:28:49Z,OWNER,I think I do that with `fcntl.flock()`: https://docs.python.org/3/library/fcntl.html#fcntl.flock,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1239008850,
https://github.com/simonw/datasette/issues/1744#issuecomment-1129185356,https://api.github.com/repos/simonw/datasette/issues/1744,1129185356,IC_kwDOBm6k_c5DTgBM,9599,2022-05-17T18:26:26Z,2022-05-17T18:26:26Z,OWNER,Not sure how to test this - I'd need to open my own lock against a database file somehow.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1239008850,
https://github.com/simonw/datasette/issues/1744#issuecomment-1129184908,https://api.github.com/repos/simonw/datasette/issues/1744,1129184908,IC_kwDOBm6k_c5DTf6M,9599,2022-05-17T18:25:57Z,2022-05-17T18:25:57Z,OWNER,"I knocked out a quick prototype of this and it worked!
datasette ~/Library/Application\ Support/Google/Chrome/Default/History --nolock
Here's the prototype diff:
```diff
diff --git a/datasette/app.py b/datasette/app.py
index b7b8437..f43700d 100644
--- a/datasette/app.py
+++ b/datasette/app.py
@@ -213,6 +213,7 @@ class Datasette:
config_dir=None,
pdb=False,
crossdb=False,
+ nolock=False,
):
assert config_dir is None or isinstance(
config_dir, Path
@@ -238,6 +239,7 @@ class Datasette:
self.databases = collections.OrderedDict()
self._refresh_schemas_lock = asyncio.Lock()
self.crossdb = crossdb
+ self.nolock = nolock
if memory or crossdb or not self.files:
self.add_database(Database(self, is_memory=True), name=""_memory"")
# memory_name is a random string so that each Datasette instance gets its own
diff --git a/datasette/cli.py b/datasette/cli.py
index 3c6e1b2..7e44665 100644
--- a/datasette/cli.py
+++ b/datasette/cli.py
@@ -452,6 +452,11 @@ def uninstall(packages, yes):
is_flag=True,
help=""Enable cross-database joins using the /_memory database"",
)
+@click.option(
+ ""--nolock"",
+ is_flag=True,
+ help=""Ignore locking and open locked files in read-only mode"",
+)
@click.option(
""--ssl-keyfile"",
help=""SSL key file"",
@@ -486,6 +491,7 @@ def serve(
open_browser,
create,
crossdb,
+ nolock,
ssl_keyfile,
ssl_certfile,
return_instance=False,
@@ -545,6 +551,7 @@ def serve(
version_note=version_note,
pdb=pdb,
crossdb=crossdb,
+ nolock=nolock,
)
# if files is a single directory, use that as config_dir=
diff --git a/datasette/database.py b/datasette/database.py
index 44d3266..fa55804 100644
--- a/datasette/database.py
+++ b/datasette/database.py
@@ -89,6 +89,8 @@ class Database:
# mode=ro or immutable=1?
if self.is_mutable:
qs = ""?mode=ro""
+ if self.ds.nolock:
+ qs += ""&nolock=1""
else:
qs = ""?immutable=1""
assert not (write and not self.is_mutable)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1239008850,
https://github.com/simonw/datasette/issues/1742#issuecomment-1128052948,https://api.github.com/repos/simonw/datasette/issues/1742,1128052948,IC_kwDOBm6k_c5DPLjU,9599,2022-05-16T19:28:31Z,2022-05-16T19:28:31Z,OWNER,"The trace mechanism is a bit gnarly - it's actually done by some ASGI middleware I wrote, so I'm pretty sure the bug is in there somewhere: https://github.com/simonw/datasette/blob/280ff372ab30df244f6c54f6f3002da57334b3d7/datasette/tracer.py#L73","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1237586379,
https://github.com/simonw/datasette/issues/1742#issuecomment-1128033018,https://api.github.com/repos/simonw/datasette/issues/1742,1128033018,IC_kwDOBm6k_c5DPGr6,9599,2022-05-16T19:06:38Z,2022-05-16T19:06:38Z,OWNER,The same URL with `.json` instead works fine: https://calands.datasettes.com/calands/CPAD_2020a_SuperUnits.json?_sort=id&id__exact=4&_labels=on&_trace=1,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1237586379,
https://github.com/simonw/datasette/issues/1739#issuecomment-1117662420,https://api.github.com/repos/simonw/datasette/issues/1739,1117662420,IC_kwDOBm6k_c5CnizU,9599,2022-05-04T18:21:18Z,2022-05-04T18:21:18Z,OWNER,That prototype is now public: https://github.com/simonw/datasette-lite,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223699280,
https://github.com/simonw/datasette/issues/1739#issuecomment-1116215371,https://api.github.com/repos/simonw/datasette/issues/1739,1116215371,IC_kwDOBm6k_c5CiBhL,9599,2022-05-03T15:12:16Z,2022-05-03T15:12:16Z,OWNER,"That worked - both DBs are 304 for me now on a subsequent load of the page:
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223699280,
https://github.com/simonw/datasette/issues/1739#issuecomment-1116183369,https://api.github.com/repos/simonw/datasette/issues/1739,1116183369,IC_kwDOBm6k_c5Ch5tJ,9599,2022-05-03T14:43:14Z,2022-05-03T14:43:14Z,OWNER,Relevant tests start here: https://github.com/simonw/datasette/blob/d60f163528f466b1127b2935c3b6869c34fd6545/tests/test_html.py#L395,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223699280,
https://github.com/simonw/datasette/issues/1739#issuecomment-1116180599,https://api.github.com/repos/simonw/datasette/issues/1739,1116180599,IC_kwDOBm6k_c5Ch5B3,9599,2022-05-03T14:40:32Z,2022-05-03T14:40:32Z,OWNER,"Database downloads are served here: https://github.com/simonw/datasette/blob/d60f163528f466b1127b2935c3b6869c34fd6545/datasette/views/database.py#L186-L192
Here's `AsgiFileDownload`: https://github.com/simonw/datasette/blob/d60f163528f466b1127b2935c3b6869c34fd6545/datasette/utils/asgi.py#L410-L430
I can add an `etag=` parameter to that and populate it with `db.hash`, if it is populated (which it always should be for immutable databases that can be downloaded).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223699280,
https://github.com/simonw/datasette/issues/1739#issuecomment-1116178727,https://api.github.com/repos/simonw/datasette/issues/1739,1116178727,IC_kwDOBm6k_c5Ch4kn,9599,2022-05-03T14:38:46Z,2022-05-03T14:38:46Z,OWNER,"Reminded myself how this works by reviewing `conditional-get`: https://github.com/simonw/conditional-get/blob/db6dfec0a296080aaf68fcd80e55fb3f0714e738/conditional_get/cli.py#L33-L52
Simply add a `If-None-Match: last-known-etag` header to the request and check that the response is a status 304 with an empty body.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223699280,
https://github.com/simonw/datasette/issues/1739#issuecomment-1115760104,https://api.github.com/repos/simonw/datasette/issues/1739,1115760104,IC_kwDOBm6k_c5CgSXo,9599,2022-05-03T05:50:19Z,2022-05-03T05:50:19Z,OWNER,Here's how Starlette does it: https://github.com/encode/starlette/blob/830f3486537916bae6b46948ff922adc14a22b7c/starlette/staticfiles.py#L213,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223699280,
https://github.com/simonw/datasette/issues/1732#issuecomment-1115533820,https://api.github.com/repos/simonw/datasette/issues/1732,1115533820,IC_kwDOBm6k_c5CfbH8,9599,2022-05-03T01:42:25Z,2022-05-03T01:42:25Z,OWNER,"Thanks, this definitely sounds like a bug. Do you have simple steps to reproduce this?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1221849746,
https://github.com/simonw/datasette/issues/1737#issuecomment-1115470180,https://api.github.com/repos/simonw/datasette/issues/1737,1115470180,IC_kwDOBm6k_c5CfLlk,9599,2022-05-02T23:39:29Z,2022-05-02T23:39:29Z,OWNER,"Test ran in 38 seconds and passed! https://github.com/simonw/datasette/runs/6265954274?check_suite_focus=true
I'm going to have it run on every commit and PR.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223459734,
https://github.com/simonw/datasette/issues/1737#issuecomment-1115468193,https://api.github.com/repos/simonw/datasette/issues/1737,1115468193,IC_kwDOBm6k_c5CfLGh,9599,2022-05-02T23:35:26Z,2022-05-02T23:35:26Z,OWNER,"https://github.com/simonw/datasette/runs/6265915080?check_suite_focus=true failed but looks like it passed because I forgot to use `set -e` at the start of the bash script.
It failed because it didn't have `build` available.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223459734,
https://github.com/simonw/datasette/issues/1737#issuecomment-1115464097,https://api.github.com/repos/simonw/datasette/issues/1737,1115464097,IC_kwDOBm6k_c5CfKGh,9599,2022-05-02T23:27:40Z,2022-05-02T23:27:40Z,OWNER,"I'm going to start off by running this manually - I may run it on every commit once this is all a little bit more stable.
I can base the workflow on https://github.com/simonw/scrape-hacker-news-by-domain/blob/main/.github/workflows/scrape.yml","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223459734,
https://github.com/simonw/datasette/issues/1737#issuecomment-1115462720,https://api.github.com/repos/simonw/datasette/issues/1737,1115462720,IC_kwDOBm6k_c5CfJxA,9599,2022-05-02T23:25:03Z,2022-05-02T23:25:03Z,OWNER,"Here's a script that seems to work. It builds the wheel, starts a Python web server that serves the wheel, runs a test with `shot-scraper` and then shuts down the server again.
```bash
#!/bin/bash
# Build the wheel
python3 -m build
# Find name of wheel
wheel=$(basename $(ls dist/*.whl))
# strip off the dist/
# Create a blank index page
echo '
' > dist/index.html
# Run a server for that dist/ folder
cd dist
python3 -m http.server 8529 &
cd ..
shot-scraper javascript http://localhost:8529/ ""
async () => {
let pyodide = await loadPyodide();
await pyodide.loadPackage(['micropip', 'ssl', 'setuptools']);
let output = await pyodide.runPythonAsync(\`
import micropip
await micropip.install('h11==0.12.0')
await micropip.install('http://localhost:8529/$wheel')
import ssl
import setuptools
from datasette.app import Datasette
ds = Datasette(memory=True, settings={'num_sql_threads': 0})
(await ds.client.get('/_memory.json?sql=select+55+as+itworks&_shape=array')).text
\`);
if (JSON.parse(output)[0].itworks != 55) {
throw 'Got ' + output + ', expected itworks: 55';
}
return 'Test passed!';
}
""
# Shut down the server
pkill -f 'http.server 8529'
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223459734,
https://github.com/simonw/datasette/issues/1733#issuecomment-1115404729,https://api.github.com/repos/simonw/datasette/issues/1733,1115404729,IC_kwDOBm6k_c5Ce7m5,9599,2022-05-02T21:49:01Z,2022-05-02T21:49:38Z,OWNER,"That alpha release works!
https://pyodide.org/en/stable/console.html
```pycon
Welcome to the Pyodide terminal emulator 🐍
Python 3.10.2 (main, Apr 9 2022 20:52:01) on WebAssembly VM
Type ""help"", ""copyright"", ""credits"" or ""license"" for more information.
>>> import micropip
>>> await micropip.install(""datasette==0.62a0"")
>>> import ssl
>>> import setuptools
>>> from datasette.app import Datasette
>>> ds = Datasette(memory=True, settings={""num_sql_threads"": 0})
>>> await ds.client.get(""/.json"")
>>> (await ds.client.get(""/.json"")).json()
{'_memory': {'name': '_memory', 'hash': None, 'color': 'a6c7b9', 'path': '/_memory', 'tables_and_views_truncated': [], 'tab
les_and_views_more': False, 'tables_count': 0, 'table_rows_sum': 0, 'show_table_row_counts': False, 'hidden_table_rows_sum'
: 0, 'hidden_tables_count': 0, 'views_count': 0, 'private': False}}
>>>
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223234932,
https://github.com/simonw/datasette/issues/1733#issuecomment-1115318417,https://api.github.com/repos/simonw/datasette/issues/1733,1115318417,IC_kwDOBm6k_c5CemiR,9599,2022-05-02T20:13:43Z,2022-05-02T20:13:43Z,OWNER,This is good enough to push an alpha.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223234932,
https://github.com/simonw/datasette/issues/1733#issuecomment-1115318303,https://api.github.com/repos/simonw/datasette/issues/1733,1115318303,IC_kwDOBm6k_c5Cemgf,9599,2022-05-02T20:13:36Z,2022-05-02T20:13:36Z,OWNER,"I got a build from the `pyodide` branch to work!
```
Welcome to the Pyodide terminal emulator 🐍
Python 3.10.2 (main, Apr 9 2022 20:52:01) on WebAssembly VM
Type ""help"", ""copyright"", ""credits"" or ""license"" for more information.
>>> import micropip
>>> await micropip.install(""https://s3.amazonaws.com/simonwillison-cors-allowed-public/datasette-0.62a0-py3-none-any.whl"")
Traceback (most recent call last):
File """", line 1, in
File ""/lib/python3.10/asyncio/futures.py"", line 284, in __await__
yield self # This tells Task to wait for completion.
File ""/lib/python3.10/asyncio/tasks.py"", line 304, in __wakeup
future.result()
File ""/lib/python3.10/asyncio/futures.py"", line 201, in result
raise self._exception
File ""/lib/python3.10/asyncio/tasks.py"", line 234, in __step
result = coro.throw(exc)
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 183, in install
transaction = await self.gather_requirements(requirements, ctx, keep_going)
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 173, in gather_requirements
await gather(*requirement_promises)
File ""/lib/python3.10/asyncio/futures.py"", line 284, in __await__
yield self # This tells Task to wait for completion.
File ""/lib/python3.10/asyncio/tasks.py"", line 304, in __wakeup
future.result()
File ""/lib/python3.10/asyncio/futures.py"", line 201, in result
raise self._exception
File ""/lib/python3.10/asyncio/tasks.py"", line 232, in __step
result = coro.send(None)
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 245, in add_requirement
await self.add_wheel(name, wheel, version, (), ctx, transaction)
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 316, in add_wheel
await self.add_requirement(recurs_req, ctx, transaction)
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 291, in add_requirement
await self.add_wheel(
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 316, in add_wheel
await self.add_requirement(recurs_req, ctx, transaction)
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 291, in add_requirement
await self.add_wheel(
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 316, in add_wheel
await self.add_requirement(recurs_req, ctx, transaction)
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 276, in add_requirement
raise ValueError(
ValueError: Requested 'h11<0.13,>=0.11', but h11==0.13.0 is already installed
>>> await micropip.install(""https://s3.amazonaws.com/simonwillison-cors-allowed-public/datasette-0.62a0-py3-none-any.whl"")
Traceback (most recent call last):
File """", line 1, in
File ""/lib/python3.10/asyncio/futures.py"", line 284, in __await__
yield self # This tells Task to wait for completion.
File ""/lib/python3.10/asyncio/tasks.py"", line 304, in __wakeup
future.result()
File ""/lib/python3.10/asyncio/futures.py"", line 201, in result
raise self._exception
File ""/lib/python3.10/asyncio/tasks.py"", line 234, in __step
result = coro.throw(exc)
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 183, in install
transaction = await self.gather_requirements(requirements, ctx, keep_going)
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 173, in gather_requirements
await gather(*requirement_promises)
File ""/lib/python3.10/asyncio/futures.py"", line 284, in __await__
yield self # This tells Task to wait for completion.
File ""/lib/python3.10/asyncio/tasks.py"", line 304, in __wakeup
future.result()
File ""/lib/python3.10/asyncio/futures.py"", line 201, in result
raise self._exception
File ""/lib/python3.10/asyncio/tasks.py"", line 232, in __step
result = coro.send(None)
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 245, in add_requirement
await self.add_wheel(name, wheel, version, (), ctx, transaction)
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 316, in add_wheel
await self.add_requirement(recurs_req, ctx, transaction)
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 291, in add_requirement
await self.add_wheel(
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 316, in add_wheel
await self.add_requirement(recurs_req, ctx, transaction)
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 291, in add_requirement
await self.add_wheel(
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 316, in add_wheel
await self.add_requirement(recurs_req, ctx, transaction)
File ""/lib/python3.10/site-packages/micropip/_micropip.py"", line 276, in add_requirement
raise ValueError(
ValueError: Requested 'h11<0.13,>=0.11', but h11==0.13.0 is already installed
>>> await micropip.install(""h11==0.12"")
>>> await micropip.install(""https://s3.amazonaws.com/simonwillison-cors-allowed-public/datasette-0.62a0-py3-none-any.whl"")
>>> import datasette
>>> from datasette.app import Datasette
Traceback (most recent call last):
File """", line 1, in
File ""/lib/python3.10/site-packages/datasette/app.py"", line 9, in
import httpx
File ""/lib/python3.10/site-packages/httpx/__init__.py"", line 2, in
from ._api import delete, get, head, options, patch, post, put, request, stream
File ""/lib/python3.10/site-packages/httpx/_api.py"", line 4, in
from ._client import Client
File ""/lib/python3.10/site-packages/httpx/_client.py"", line 9, in
from ._auth import Auth, BasicAuth, FunctionAuth
File ""/lib/python3.10/site-packages/httpx/_auth.py"", line 10, in
from ._models import Request, Response
File ""/lib/python3.10/site-packages/httpx/_models.py"", line 16, in
from ._content import ByteStream, UnattachedStream, encode_request, encode_response
File ""/lib/python3.10/site-packages/httpx/_content.py"", line 17, in
from ._multipart import MultipartStream
File ""/lib/python3.10/site-packages/httpx/_multipart.py"", line 7, in
from ._types import (
File ""/lib/python3.10/site-packages/httpx/_types.py"", line 5, in
import ssl
File ""/lib/python3.10/ssl.py"", line 98, in
import _ssl # if we can't import it, let the error propagate
ModuleNotFoundError: No module named '_ssl'
>>> import ssl
>>> from datasette.app import Datasette
Traceback (most recent call last):
File """", line 1, in
File ""/lib/python3.10/site-packages/datasette/app.py"", line 14, in
import pkg_resources
ModuleNotFoundError: No module named 'pkg_resources'
>>> import setuptools
>>> from datasette.app import Datasette
>>> ds = Datasette(memory=True)
>>> ds
>>> await ds.client.get(""/"")
Traceback (most recent call last):
File ""/lib/python3.10/site-packages/datasette/app.py"", line 1268, in route_path
response = await view(request, send)
File ""/lib/python3.10/site-packages/datasette/views/base.py"", line 134, in view
return await self.dispatch_request(request)
File ""/lib/python3.10/site-packages/datasette/views/base.py"", line 89, in dispatch_request
await self.ds.refresh_schemas()
File ""/lib/python3.10/site-packages/datasette/app.py"", line 353, in refresh_schemas
await self._refresh_schemas()
File ""/lib/python3.10/site-packages/datasette/app.py"", line 358, in _refresh_schemas
await init_internal_db(internal_db)
File ""/lib/python3.10/site-packages/datasette/utils/internal_db.py"", line 65, in init_internal_db
await db.execute_write_script(create_tables_sql)
File ""/lib/python3.10/site-packages/datasette/database.py"", line 116, in execute_write_script
results = await self.execute_write_fn(_inner, block=block)
File ""/lib/python3.10/site-packages/datasette/database.py"", line 155, in execute_write_fn
self._write_thread.start()
File ""/lib/python3.10/threading.py"", line 928, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
>>> ds = Datasette(memory=True, settings={""num_sql_threads"": 0})
>>> await ds.client.get(""/"")
>>> (await ds.client.get(""/"")).text
'\n\n\n Datasette: _memory\n \n \n\n\n\n
\n\n\n\n\n \n\n\n\n\n\n
Datasette
\n\n\n\n\n\n
r detailsClickedWithin = null;\n while (target && target.tagName != \'DETAILS\') {\n target = target.parentNode;\
n }\n if (target && target.tagName == \'DETAILS\') {\n detailsClickedWithin = target;\n }\n Array.from(d
ocument.getElementsByTagName(\'details\')).filter(\n (details) => details.open && details != detailsClickedWithin\n
).forEach(details => details.open = false);\n});\n\n\n\n\n\n\n
'
>>>
```
That `ValueError: Requested 'h11<0.13,>=0.11', but h11==0.13.0 is already installed` error is annoying. I assume it's a `uvicorn` dependency clash of some sort, because I wasn't getting that when I removed `uvicorn` as a dependency.
I can avoid it by running this first though:
await micropip.install(""h11==0.12"")","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223234932,
https://github.com/simonw/datasette/issues/1735#issuecomment-1115301733,https://api.github.com/repos/simonw/datasette/issues/1735,1115301733,IC_kwDOBm6k_c5Ceidl,9599,2022-05-02T19:57:19Z,2022-05-02T19:59:03Z,OWNER,"This code breaks if that setting is 0:
https://github.com/simonw/datasette/blob/a29c1277896b6a7905ef5441c42a37bc15f67599/datasette/app.py#L291-L293
It's used here:
https://github.com/simonw/datasette/blob/a29c1277896b6a7905ef5441c42a37bc15f67599/datasette/database.py#L188-L190","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223263540,
https://github.com/simonw/datasette/issues/1733#issuecomment-1115288284,https://api.github.com/repos/simonw/datasette/issues/1733,1115288284,IC_kwDOBm6k_c5CefLc,9599,2022-05-02T19:40:33Z,2022-05-02T19:40:33Z,OWNER,"I'll release this as a `0.62a0` as soon as it's ready, so I can start testing it out in Pyodide for real.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223234932,
https://github.com/simonw/datasette/issues/1734#issuecomment-1115283922,https://api.github.com/repos/simonw/datasette/issues/1734,1115283922,IC_kwDOBm6k_c5CeeHS,9599,2022-05-02T19:35:32Z,2022-05-02T19:35:32Z,OWNER,I'll use my original from 2009: https://www.djangosnippets.org/snippets/1431/,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223241647,
https://github.com/simonw/datasette/issues/1734#issuecomment-1115282773,https://api.github.com/repos/simonw/datasette/issues/1734,1115282773,IC_kwDOBm6k_c5Ced1V,9599,2022-05-02T19:34:15Z,2022-05-02T19:34:15Z,OWNER,I'm going to vendor it and update the documentation.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223241647,
https://github.com/simonw/datasette/issues/1733#issuecomment-1115278325,https://api.github.com/repos/simonw/datasette/issues/1733,1115278325,IC_kwDOBm6k_c5Cecv1,9599,2022-05-02T19:29:05Z,2022-05-02T19:29:05Z,OWNER,"I'm going to add a Datasette setting to disable threading entirely, designed for usage in this particular case.
I thought about adding a new setting, then I noticed this:
datasette mydatabase.db --setting num_sql_threads 10
I'm going to let users set that to `0` to disable threaded execution of SQL queries.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223234932,
https://github.com/simonw/datasette/issues/1733#issuecomment-1115268245,https://api.github.com/repos/simonw/datasette/issues/1733,1115268245,IC_kwDOBm6k_c5CeaSV,9599,2022-05-02T19:18:11Z,2022-05-02T19:18:11Z,OWNER,"Maybe I can leave `uvicorn` as a dependency? Installing it works OK, it only generates errors when you try to import it:
```pycon
Welcome to the Pyodide terminal emulator 🐍
Python 3.10.2 (main, Apr 9 2022 20:52:01) on WebAssembly VM
Type ""help"", ""copyright"", ""credits"" or ""license"" for more information.
>>> import micropip
>>> await micropip.install(""uvicorn"")
>>> import uvicorn
Traceback (most recent call last):
File """", line 1, in
File ""/lib/python3.10/site-packages/uvicorn/__init__.py"", line 1, in
from uvicorn.config import Config
File ""/lib/python3.10/site-packages/uvicorn/config.py"", line 8, in
import ssl
File ""/lib/python3.10/ssl.py"", line 98, in
import _ssl # if we can't import it, let the error propagate
ModuleNotFoundError: No module named '_ssl'
>>> import ssl
>>> import uvicorn
Traceback (most recent call last):
File """", line 1, in
File ""/lib/python3.10/site-packages/uvicorn/__init__.py"", line 2, in
from uvicorn.main import Server, main, run
File ""/lib/python3.10/site-packages/uvicorn/main.py"", line 24, in
from uvicorn.supervisors import ChangeReload, Multiprocess
File ""/lib/python3.10/site-packages/uvicorn/supervisors/__init__.py"", line 3, in
from uvicorn.supervisors.basereload import BaseReload
File ""/lib/python3.10/site-packages/uvicorn/supervisors/basereload.py"", line 12, in
from uvicorn.subprocess import get_subprocess
File ""/lib/python3.10/site-packages/uvicorn/subprocess.py"", line 14, in
multiprocessing.allow_connection_pickling()
File ""/lib/python3.10/multiprocessing/context.py"", line 170, in allow_connection_pickling
from . import connection
File ""/lib/python3.10/multiprocessing/connection.py"", line 21, in
import _multiprocessing
ModuleNotFoundError: No module named '_multiprocessing'
>>> import multiprocessing
>>> import uvicorn
Traceback (most recent call last):
File """", line 1, in
File ""/lib/python3.10/site-packages/uvicorn/__init__.py"", line 2, in
from uvicorn.main import Server, main, run
File ""/lib/python3.10/site-packages/uvicorn/main.py"", line 24, in
from uvicorn.supervisors import ChangeReload, Multiprocess
File ""/lib/python3.10/site-packages/uvicorn/supervisors/__init__.py"", line 3, in
from uvicorn.supervisors.basereload import BaseReload
File ""/lib/python3.10/site-packages/uvicorn/supervisors/basereload.py"", line 12, in
from uvicorn.subprocess import get_subprocess
File ""/lib/python3.10/site-packages/uvicorn/subprocess.py"", line 14, in
multiprocessing.allow_connection_pickling()
File ""/lib/python3.10/multiprocessing/context.py"", line 170, in allow_connection_pickling
from . import connection
File ""/lib/python3.10/multiprocessing/connection.py"", line 21, in
import _multiprocessing
ModuleNotFoundError: No module named '_multiprocessing'
>>>
```
Since the `import ssl` trick fixed the `_ssl` error I was hopeful that `import multiprocessing` could fix the `_multiprocessing` one, but sadly it did not.
But it looks like i can address this issue just by making `import uvicorn` in `app.py` an optional import.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223234932,
https://github.com/simonw/datasette/issues/1733#issuecomment-1115262218,https://api.github.com/repos/simonw/datasette/issues/1733,1115262218,IC_kwDOBm6k_c5CeY0K,9599,2022-05-02T19:11:51Z,2022-05-02T19:14:01Z,OWNER,"Here's the full diff I applied to Datasette to get it fully working in Pyodide:
https://github.com/simonw/datasette/compare/94a3171b01fde5c52697aeeff052e3ad4bab5391...8af32bc5b03c30b1f7a4a8cc4bd80eb7e2ee7b81
And as a visible diff:
```diff
diff --git a/datasette/app.py b/datasette/app.py
index d269372..6c0c5fc 100644
--- a/datasette/app.py
+++ b/datasette/app.py
@@ -15,7 +15,6 @@ import pkg_resources
import re
import secrets
import sys
-import threading
import traceback
import urllib.parse
from concurrent import futures
@@ -26,7 +25,6 @@ from itsdangerous import URLSafeSerializer
from jinja2 import ChoiceLoader, Environment, FileSystemLoader, PrefixLoader
from jinja2.environment import Template
from jinja2.exceptions import TemplateNotFound
-import uvicorn
from .views.base import DatasetteError, ureg
from .views.database import DatabaseDownload, DatabaseView
@@ -813,7 +811,6 @@ class Datasette:
},
""datasette"": datasette_version,
""asgi"": ""3.0"",
- ""uvicorn"": uvicorn.__version__,
""sqlite"": {
""version"": sqlite_version,
""fts_versions"": fts_versions,
@@ -854,23 +851,7 @@ class Datasette:
]
def _threads(self):
- threads = list(threading.enumerate())
- d = {
- ""num_threads"": len(threads),
- ""threads"": [
- {""name"": t.name, ""ident"": t.ident, ""daemon"": t.daemon} for t in threads
- ],
- }
- # Only available in Python 3.7+
- if hasattr(asyncio, ""all_tasks""):
- tasks = asyncio.all_tasks()
- d.update(
- {
- ""num_tasks"": len(tasks),
- ""tasks"": [_cleaner_task_str(t) for t in tasks],
- }
- )
- return d
+ return {""num_threads"": 0, ""threads"": []}
def _actor(self, request):
return {""actor"": request.actor}
diff --git a/datasette/database.py b/datasette/database.py
index ba594a8..b50142d 100644
--- a/datasette/database.py
+++ b/datasette/database.py
@@ -4,7 +4,6 @@ from pathlib import Path
import janus
import queue
import sys
-import threading
import uuid
from .tracer import trace
@@ -21,8 +20,6 @@ from .utils import (
)
from .inspect import inspect_hash
-connections = threading.local()
-
AttachedDatabase = namedtuple(""AttachedDatabase"", (""seq"", ""name"", ""file""))
@@ -43,12 +40,12 @@ class Database:
self.hash = None
self.cached_size = None
self._cached_table_counts = None
- self._write_thread = None
- self._write_queue = None
if not self.is_mutable and not self.is_memory:
p = Path(path)
self.hash = inspect_hash(p)
self.cached_size = p.stat().st_size
+ self._read_connection = None
+ self._write_connection = None
@property
def cached_table_counts(self):
@@ -134,60 +131,17 @@ class Database:
return results
async def execute_write_fn(self, fn, block=True):
- task_id = uuid.uuid5(uuid.NAMESPACE_DNS, ""datasette.io"")
- if self._write_queue is None:
- self._write_queue = queue.Queue()
- if self._write_thread is None:
- self._write_thread = threading.Thread(
- target=self._execute_writes, daemon=True
- )
- self._write_thread.start()
- reply_queue = janus.Queue()
- self._write_queue.put(WriteTask(fn, task_id, reply_queue))
- if block:
- result = await reply_queue.async_q.get()
- if isinstance(result, Exception):
- raise result
- else:
- return result
- else:
- return task_id
-
- def _execute_writes(self):
- # Infinite looping thread that protects the single write connection
- # to this database
- conn_exception = None
- conn = None
- try:
- conn = self.connect(write=True)
- self.ds._prepare_connection(conn, self.name)
- except Exception as e:
- conn_exception = e
- while True:
- task = self._write_queue.get()
- if conn_exception is not None:
- result = conn_exception
- else:
- try:
- result = task.fn(conn)
- except Exception as e:
- sys.stderr.write(""{}\n"".format(e))
- sys.stderr.flush()
- result = e
- task.reply_queue.sync_q.put(result)
+ # We always treat it as if block=True now
+ if self._write_connection is None:
+ self._write_connection = self.connect(write=True)
+ self.ds._prepare_connection(self._write_connection, self.name)
+ return fn(self._write_connection)
async def execute_fn(self, fn):
- def in_thread():
- conn = getattr(connections, self.name, None)
- if not conn:
- conn = self.connect()
- self.ds._prepare_connection(conn, self.name)
- setattr(connections, self.name, conn)
- return fn(conn)
-
- return await asyncio.get_event_loop().run_in_executor(
- self.ds.executor, in_thread
- )
+ if self._read_connection is None:
+ self._read_connection = self.connect()
+ self.ds._prepare_connection(self._read_connection, self.name)
+ return fn(self._read_connection)
async def execute(
self,
diff --git a/setup.py b/setup.py
index 7f0562f..c41669c 100644
--- a/setup.py
+++ b/setup.py
@@ -44,20 +44,20 @@ setup(
install_requires=[
""asgiref>=3.2.10,<3.6.0"",
""click>=7.1.1,<8.2.0"",
- ""click-default-group~=1.2.2"",
+ # ""click-default-group~=1.2.2"",
""Jinja2>=2.10.3,<3.1.0"",
""hupper~=1.9"",
""httpx>=0.20"",
""pint~=0.9"",
""pluggy>=1.0,<1.1"",
- ""uvicorn~=0.11"",
+ # ""uvicorn~=0.11"",
""aiofiles>=0.4,<0.9"",
""janus>=0.6.2,<1.1"",
""asgi-csrf>=0.9"",
""PyYAML>=5.3,<7.0"",
""mergedeep>=1.1.1,<1.4.0"",
""itsdangerous>=1.1,<3.0"",
- ""python-baseconv==1.2.2"",
+ # ""python-baseconv==1.2.2"",
],
entry_points=""""""
[console_scripts]
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223234932,
https://github.com/simonw/datasette/issues/1734#issuecomment-1115260999,https://api.github.com/repos/simonw/datasette/issues/1734,1115260999,IC_kwDOBm6k_c5CeYhH,9599,2022-05-02T19:10:34Z,2022-05-02T19:10:34Z,OWNER,"This is actually mostly a documentation thing: here: https://docs.datasette.io/en/0.61.1/authentication.html#including-an-expiry-time
In the code it's only used in these two places:
https://github.com/simonw/datasette/blob/0a7621f96f8ad14da17e7172e8a7bce24ef78966/datasette/actor_auth_cookie.py#L16-L20
https://github.com/simonw/datasette/blob/0a7621f96f8ad14da17e7172e8a7bce24ef78966/tests/test_auth.py#L56-L60","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223241647,
https://github.com/simonw/datasette/issues/1733#issuecomment-1115258737,https://api.github.com/repos/simonw/datasette/issues/1733,1115258737,IC_kwDOBm6k_c5CeX9x,9599,2022-05-02T19:08:17Z,2022-05-02T19:08:17Z,OWNER,"I was going to vendor `baseconv.py`, but then I reconsidered - what if there are plugins out there that expect `import baseconv` to work because they have dependend on Datasette?
I used https://cs.github.com/ and as far as I can tell there aren't any!
So I'm going to remove that dependency and work out a smarter way to do this - probably by providing a utility function within Datasette itself.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223234932,
https://github.com/simonw/datasette/issues/1733#issuecomment-1115256318,https://api.github.com/repos/simonw/datasette/issues/1733,1115256318,IC_kwDOBm6k_c5CeXX-,9599,2022-05-02T19:05:55Z,2022-05-02T19:05:55Z,OWNER,"I released a `click-default-group-wheel` package to solve that dependency issue. I've already upgraded `sqlite-utils` to that, so now you can use that in Pyodide:
- https://github.com/simonw/sqlite-utils/pull/429
`python-baseconv` is only used for actor cookie expiration times:
https://github.com/simonw/datasette/blob/0a7621f96f8ad14da17e7172e8a7bce24ef78966/datasette/actor_auth_cookie.py#L16-L20
Datasette never actually sets that cookie itself - it instead encourages plugins to set it in the authentication documentation here: https://docs.datasette.io/en/0.61.1/authentication.html#including-an-expiry-time","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223234932,
https://github.com/simonw/sqlite-utils/pull/429#issuecomment-1115196863,https://api.github.com/repos/simonw/sqlite-utils/issues/429,1115196863,IC_kwDOCGYnMM5CeI2_,9599,2022-05-02T18:03:47Z,2022-05-02T18:52:42Z,OWNER,"I made a build of this branch and tested it like this: https://pyodide.org/en/stable/console.html
```pycon
>>> import micropip
>>> await micropip.install(""https://s3.amazonaws.com/simonwillison-cors-allowed-public/sqlite_utils-3.26-py3-none-any.whl"")
>>> import sqlite_utils
>>> db = sqlite_utils.Database(memory=True)
>>> list(db.query(""select 32443 + 55""))
[{'32443 + 55': 32498}]
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223177069,
https://github.com/simonw/sqlite-utils/pull/429#issuecomment-1115197644,https://api.github.com/repos/simonw/sqlite-utils/issues/429,1115197644,IC_kwDOCGYnMM5CeJDM,9599,2022-05-02T18:04:28Z,2022-05-02T18:04:28Z,OWNER,I'm going to ship this straight away as `3.26.1`.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1223177069,
https://github.com/simonw/datasette/issues/1727#issuecomment-1114058210,https://api.github.com/repos/simonw/datasette/issues/1727,1114058210,IC_kwDOBm6k_c5CZy3i,9599,2022-04-30T21:39:34Z,2022-04-30T21:39:34Z,OWNER,"Something to consider if I look into subprocesses for parallel query execution:
https://sqlite.org/howtocorrupt.html#_carrying_an_open_database_connection_across_a_fork_
> Do not open an SQLite database connection, then fork(), then try to use that database connection in the child process. All kinds of locking problems will result and you can easily end up with a corrupt database. SQLite is not designed to support that kind of behavior. Any database connection that is used in a child process must be opened in the child process, not inherited from the parent. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1729#issuecomment-1114038259,https://api.github.com/repos/simonw/datasette/issues/1729,1114038259,IC_kwDOBm6k_c5CZt_z,9599,2022-04-30T19:06:03Z,2022-04-30T19:06:03Z,OWNER,"> but actually the facet results would be better if they were a list rather than a dictionary
I think `facet_results` in the JSON should match this (used by the HTML) instead:
https://github.com/simonw/datasette/blob/942411ef946e9a34a2094944d3423cddad27efd3/datasette/views/table.py#L737-L741
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1219385669,
https://github.com/simonw/datasette/issues/1729#issuecomment-1114036946,https://api.github.com/repos/simonw/datasette/issues/1729,1114036946,IC_kwDOBm6k_c5CZtrS,9599,2022-04-30T18:56:25Z,2022-04-30T19:04:03Z,OWNER,"Related:
- #1558
Which talks about how there was confusion in this example: https://latest.datasette.io/fixtures/facetable.json?_facet=created&_facet_date=created&_facet=tags&_facet_array=tags&_nosuggest=1&_size=0
Which I fixed in #625 by introducing `tags` and `tags_2` keys, but actually the facet results would be better if they were a list rather than a dictionary.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1219385669,
https://github.com/simonw/datasette/issues/1729#issuecomment-1114037521,https://api.github.com/repos/simonw/datasette/issues/1729,1114037521,IC_kwDOBm6k_c5CZt0R,9599,2022-04-30T19:01:07Z,2022-04-30T19:01:07Z,OWNER,"I had to look up what `hideable` means - it means that you can't hide the current facet because it was defined in metadata, not as a `?_facet=` parameter:
https://github.com/simonw/datasette/blob/4e47a2d894b96854348343374c8e97c9d7055cf6/datasette/facets.py#L228
That's a bit of a weird thing to expose in the API. Maybe change that to `source` so it can be `metadata` or `request`? That's very slightly less coupled to how the UI works.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1219385669,
https://github.com/simonw/datasette/issues/1729#issuecomment-1114013757,https://api.github.com/repos/simonw/datasette/issues/1729,1114013757,IC_kwDOBm6k_c5CZoA9,9599,2022-04-30T16:15:51Z,2022-04-30T18:54:39Z,OWNER,"Deployed a preview of this here: https://latest-1-0-alpha.datasette.io/
Examples:
- https://latest-1-0-alpha.datasette.io/fixtures/facetable.json
- https://latest-1-0-alpha.datasette.io/fixtures/facetable.json?_facet=state&_size=0&_extra=facet_results&_extra=count
Second example produces:
```json
{
""rows"": [],
""next"": null,
""next_url"": null,
""count"": 15,
""facet_results"": {
""state"": {
""name"": ""state"",
""type"": ""column"",
""hideable"": true,
""toggle_url"": ""/fixtures/facetable.json?_size=0&_extra=facet_results&_extra=count"",
""results"": [
{
""value"": ""CA"",
""label"": ""CA"",
""count"": 10,
""toggle_url"": ""https://latest-1-0-alpha.datasette.io/fixtures/facetable.json?_facet=state&_size=0&_extra=facet_results&_extra=count&state=CA"",
""selected"": false
},
{
""value"": ""MI"",
""label"": ""MI"",
""count"": 4,
""toggle_url"": ""https://latest-1-0-alpha.datasette.io/fixtures/facetable.json?_facet=state&_size=0&_extra=facet_results&_extra=count&state=MI"",
""selected"": false
},
{
""value"": ""MC"",
""label"": ""MC"",
""count"": 1,
""toggle_url"": ""https://latest-1-0-alpha.datasette.io/fixtures/facetable.json?_facet=state&_size=0&_extra=facet_results&_extra=count&state=MC"",
""selected"": false
}
],
""truncated"": false
}
}
}
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1219385669,
https://github.com/simonw/datasette/issues/1727#issuecomment-1112889800,https://api.github.com/repos/simonw/datasette/issues/1727,1112889800,IC_kwDOBm6k_c5CVVnI,9599,2022-04-29T05:29:38Z,2022-04-29T05:29:38Z,OWNER,"OK, I just got the most incredible result with that!
I started up a container running `bash` like this, from my `datasette` checkout. I'm mapping port 8005 on my laptop to port 8001 inside the container because laptop port 8001 was already doing something else:
```
docker run -it --rm --name my-running-script -p 8005:8001 -v ""$PWD"":/usr/src/myapp \
-w /usr/src/myapp nogil/python bash
```
Then in `bash` I ran the following commands to install Datasette and its dependencies:
```
pip install -e '.[test]'
pip install datasette-pretty-traces # For debug tracing
```
Then I started Datasette against my `github.db` database (from github-to-sqlite.dogsheep.net/github.db) like this:
```
datasette github.db -h 0.0.0.0 --setting trace_debug 1
```
I hit the following two URLs to compare the parallel v.s. not parallel implementations:
- `http://127.0.0.1:8005/github/issues?_facet=milestone&_facet=repo&_trace=1&_size=10`
- `http://127.0.0.1:8005/github/issues?_facet=milestone&_facet=repo&_trace=1&_size=10&_noparallel=1`
And... the parallel one beat the non-parallel one decisively, on multiple page refreshes!
Not parallel: 77ms
Parallel: 47ms
So yeah, I'm very confident this is a problem with the GIL. And I am absolutely **stunned** that @colesbury's fork ran Datasette (which has some reasonably tricky threading and async stuff going on) out of the box!","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1112879463,https://api.github.com/repos/simonw/datasette/issues/1727,1112879463,IC_kwDOBm6k_c5CVTFn,9599,2022-04-29T05:03:58Z,2022-04-29T05:03:58Z,OWNER,"It would be _really_ fun to try running this with the in-development `nogil` Python from https://github.com/colesbury/nogil
There's a Docker container for it: https://hub.docker.com/r/nogil/python
It suggests you can run something like this:
docker run -it --rm --name my-running-script -v ""$PWD"":/usr/src/myapp \
-w /usr/src/myapp nogil/python python your-daemon-or-script.py","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1112878955,https://api.github.com/repos/simonw/datasette/issues/1727,1112878955,IC_kwDOBm6k_c5CVS9r,9599,2022-04-29T05:02:40Z,2022-04-29T05:02:40Z,OWNER,"Here's a very useful (recent) article about how the GIL works and how to think about it: https://pythonspeed.com/articles/python-gil/ - via https://lobste.rs/s/9hj80j/when_python_can_t_thread_deep_dive_into_gil
From that article:
> For example, let's consider an extension module written in C or Rust that lets you talk to a PostgreSQL database server.
>
> Conceptually, handling a SQL query with this library will go through three steps:
>
> 1. Deserialize from Python to the internal library representation. Since this will be reading Python objects, it needs to hold the GIL.
> 2. Send the query to the database server, and wait for a response. This doesn't need the GIL.
> 3. Convert the response into Python objects. This needs the GIL again.
>
> As you can see, how much parallelism you can get depends on how much time is spent in each step. If the bulk of time is spent in step 2, you'll get parallelism there. But if, for example, you run a `SELECT` and get a large number of rows back, the library will need to create many Python objects, and step 3 will have to hold GIL for a while.
That explains what I'm seeing here. I'm pretty convinced now that the reason I'm not getting a performance boost from parallel queries is that there's more time spent in Python code assembling the results than in SQLite C code executing the query.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1729#issuecomment-1112734577,https://api.github.com/repos/simonw/datasette/issues/1729,1112734577,IC_kwDOBm6k_c5CUvtx,9599,2022-04-28T23:08:42Z,2022-04-28T23:08:42Z,OWNER,"That prototype is a very small amount of code so far:
```diff
diff --git a/datasette/renderer.py b/datasette/renderer.py
index 4508949..b600e1b 100644
--- a/datasette/renderer.py
+++ b/datasette/renderer.py
@@ -28,6 +28,10 @@ def convert_specific_columns_to_json(rows, columns, json_cols):
def json_renderer(args, data, view_name):
""""""Render a response as JSON""""""
+ from pprint import pprint
+
+ pprint(data)
+
status_code = 200
# Handle the _json= parameter which may modify data[""rows""]
@@ -43,6 +47,41 @@ def json_renderer(args, data, view_name):
if ""rows"" in data and not value_as_boolean(args.get(""_json_infinity"", ""0"")):
data[""rows""] = [remove_infinites(row) for row in data[""rows""]]
+ # Start building the default JSON here
+ columns = data[""columns""]
+ next_url = data.get(""next_url"")
+ output = {
+ ""rows"": [dict(zip(columns, row)) for row in data[""rows""]],
+ ""next"": data[""next""],
+ ""next_url"": next_url,
+ }
+
+ extras = set(args.getlist(""_extra""))
+
+ extras_map = {
+ # _extra= : data[field]
+ ""count"": ""filtered_table_rows_count"",
+ ""facet_results"": ""facet_results"",
+ ""suggested_facets"": ""suggested_facets"",
+ ""columns"": ""columns"",
+ ""primary_keys"": ""primary_keys"",
+ ""query_ms"": ""query_ms"",
+ ""query"": ""query"",
+ }
+ for extra_key, data_key in extras_map.items():
+ if extra_key in extras:
+ output[extra_key] = data[data_key]
+
+ body = json.dumps(output, cls=CustomJSONEncoder)
+ content_type = ""application/json; charset=utf-8""
+ headers = {}
+ if next_url:
+ headers[""link""] = f'<{next_url}>; rel=""next""'
+ return Response(
+ body, status=status_code, headers=headers, content_type=content_type
+ )
+
+
# Deal with the _shape option
shape = args.get(""_shape"", ""arrays"")
# if there's an error, ignore the shape entirely
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1219385669,
https://github.com/simonw/datasette/issues/1729#issuecomment-1112732563,https://api.github.com/repos/simonw/datasette/issues/1729,1112732563,IC_kwDOBm6k_c5CUvOT,9599,2022-04-28T23:05:03Z,2022-04-28T23:05:03Z,OWNER,"OK, the prototype of this is looking really good - it's very pleasant to use.
`http://127.0.0.1:8001/github_memory/issue_comments.json?_search=simon&_sort=id&_size=5&_extra=query_ms&_extra=count&_col=body` returns this:
```json
{
""rows"": [
{
""id"": 338854988,
""body"": "" /database-name/table-name?name__contains=simon&sort=id+desc\r\n\r\nNote that if there's a column called \""sort\"" you can still do sort__exact=blah\r\n\r\n""
},
{
""id"": 346427794,
""body"": ""Thanks. There is a way to use pip to grab apsw, which also let's you configure it (flags to build extensions, use an internal sqlite, etc). Don't know how that works as a dependency for another package, though.\n\nOn November 22, 2017 11:38:06 AM EST, Simon Willison wrote:\n>I have a solution for FTS already, but I'm interested in apsw as a\n>mechanism for allowing custom virtual tables to be written in Python\n>(pysqlite only lets you write custom functions)\n>\n>Not having PyPI support is pretty tough though. I'm planning a\n>plugin/extension system which would be ideal for things like an\n>optional apsw mode, but that's a lot harder if apsw isn't in PyPI.\n>\n>-- \n>You are receiving this because you authored the thread.\n>Reply to this email directly or view it on GitHub:\n>https://github.com/simonw/datasette/issues/144#issuecomment-346405660\n""
},
{
""id"": 348252037,
""body"": ""WOW!\n\n--\nPaul Ford // (646) 369-7128 // @ftrain\n\nOn Thu, Nov 30, 2017 at 11:47 AM, Simon Willison \nwrote:\n\n> Remaining work on this now lives in a milestone:\n> https://github.com/simonw/datasette/milestone/6\n>\n> —\n> You are receiving this because you were mentioned.\n> Reply to this email directly, view it on GitHub\n> ,\n> or mute the thread\n> \n> .\n>\n""
},
{
""id"": 391141391,
""body"": ""I'm going to clean this up for consistency tomorrow morning so hold off\nmerging until then please\n\nOn Tue, May 22, 2018 at 6:34 PM, Simon Willison \nwrote:\n\n> Yeah let's try this without pysqlite3 and see if we still get the correct\n> version.\n>\n> —\n> You are receiving this because you authored the thread.\n> Reply to this email directly, view it on GitHub\n> , or mute\n> the thread\n> \n> .\n>\n""
},
{
""id"": 391355030,
""body"": ""No objections;\r\nIt's good to go @simonw\r\n\r\nOn Wed, 23 May 2018, 14:51 Simon Willison, wrote:\r\n\r\n> @r4vi any objections to me merging this?\r\n>\r\n> —\r\n> You are receiving this because you were mentioned.\r\n> Reply to this email directly, view it on GitHub\r\n> , or mute\r\n> the thread\r\n> \r\n> .\r\n>\r\n""
}
],
""next"": ""391355030,391355030"",
""next_url"": ""http://127.0.0.1:8001/github_memory/issue_comments.json?_search=simon&_size=5&_extra=query_ms&_extra=count&_col=body&_next=391355030%2C391355030&_sort=id"",
""count"": 57,
""query_ms"": 21.780223003588617
}
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1219385669,
https://github.com/simonw/datasette/issues/1729#issuecomment-1112730416,https://api.github.com/repos/simonw/datasette/issues/1729,1112730416,IC_kwDOBm6k_c5CUusw,9599,2022-04-28T23:01:21Z,2022-04-28T23:01:21Z,OWNER,"I'm not sure what to do about the `""truncated"": true/false` key.
It's not really relevant to table results, since they are paginated whether or not you ask for them to be.
It plays a role in query results, where you might run `select * from table` and get back 1000 results because Datasette truncates at that point rather than returning everything.
Adding it to every table result and always setting it to `""truncated"": false` feels confusing.
I think I'm going to keep it exclusively in the default representation for the `/db?sql=...` query endpoint, and not return it at all for tables.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1219385669,
https://github.com/simonw/datasette/issues/1729#issuecomment-1112721321,https://api.github.com/repos/simonw/datasette/issues/1729,1112721321,IC_kwDOBm6k_c5CUsep,9599,2022-04-28T22:44:05Z,2022-04-28T22:44:14Z,OWNER,I may be able to implement this mostly in the `json_renderer()` function: https://github.com/simonw/datasette/blob/94a3171b01fde5c52697aeeff052e3ad4bab5391/datasette/renderer.py#L29-L34,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1219385669,
https://github.com/simonw/datasette/issues/1729#issuecomment-1112717745,https://api.github.com/repos/simonw/datasette/issues/1729,1112717745,IC_kwDOBm6k_c5CUrmx,9599,2022-04-28T22:38:39Z,2022-04-28T22:39:05Z,OWNER,"(I remain keen on the idea of shipping a plugin that restores the old default API shape to people who have written pre-Datasette-1.0 code against it, but I'll tackle that much later. I really like how jQuery has a culture of doing this.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1219385669,
https://github.com/simonw/datasette/issues/1729#issuecomment-1112717210,https://api.github.com/repos/simonw/datasette/issues/1729,1112717210,IC_kwDOBm6k_c5CUrea,9599,2022-04-28T22:37:37Z,2022-04-28T22:37:37Z,OWNER,"This means `filtered_table_rows_count` is going to become `count`. I had originally picked that terrible name to avoid confusion between the count of all rows in the table and the count of rows that were filtered.
I'll add `?_extra=table_count` for getting back the full table count instead. I think `count` is clear enough!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1219385669,
https://github.com/simonw/datasette/issues/1729#issuecomment-1112716611,https://api.github.com/repos/simonw/datasette/issues/1729,1112716611,IC_kwDOBm6k_c5CUrVD,9599,2022-04-28T22:36:24Z,2022-04-28T22:36:24Z,OWNER,"Then I'm going to implement the following `?_extra=` options:
- `?_extra=facet_results` - to see facet results
- `?_extra=suggested_facets` - for suggested facets
- `?_extra=count` - for the count of total rows
- `?_extra=columns` - for a list of column names
- `?_extra=primary_keys` - for a list of primary keys
- `?_extra=query` - a `{""sql"" ""select ..."", ""params"": {}}` object
I thought about having `?_extra=facet_results` returned automatically if the user specifies at least one `?_facet` - but that doesn't work for default facets configured in `metadata.json` - how can the user opt out of those being returned? So I'm going to say you don't see facets at all if you don't include `?_extra=facet_results`.
I'm tempted to add `?_extra=_all` to return everything, but I can decide if that's a good idea later.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1219385669,
https://github.com/simonw/datasette/issues/1729#issuecomment-1112713581,https://api.github.com/repos/simonw/datasette/issues/1729,1112713581,IC_kwDOBm6k_c5CUqlt,9599,2022-04-28T22:31:11Z,2022-04-28T22:31:11Z,OWNER,"I'm going to change the default API response to look like this:
```json
{
""rows"": [
{
""pk"": 1,
""created"": ""2019-01-14 08:00:00"",
""planet_int"": 1,
""on_earth"": 1,
""state"": ""CA"",
""_city_id"": 1,
""_neighborhood"": ""Mission"",
""tags"": ""[\""tag1\"", \""tag2\""]"",
""complex_array"": ""[{\""foo\"": \""bar\""}]"",
""distinct_some_null"": ""one"",
""n"": ""n1""
},
{
""pk"": 2,
""created"": ""2019-01-14 08:00:00"",
""planet_int"": 1,
""on_earth"": 1,
""state"": ""CA"",
""_city_id"": 1,
""_neighborhood"": ""Dogpatch"",
""tags"": ""[\""tag1\"", \""tag3\""]"",
""complex_array"": ""[]"",
""distinct_some_null"": ""two"",
""n"": ""n2""
}
],
""next"": null,
""next_url"": null
}
```
Basically https://latest.datasette.io/fixtures/facetable.json?_shape=objects but with just the `rows`, `next` and `next_url` fields returned by default.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1219385669,
https://github.com/simonw/datasette/issues/1715#issuecomment-1112711115,https://api.github.com/repos/simonw/datasette/issues/1715,1112711115,IC_kwDOBm6k_c5CUp_L,9599,2022-04-28T22:26:56Z,2022-04-28T22:26:56Z,OWNER,"I'm not going to use `asyncinject` in this refactor - at least not until I really need it. My research in these issues has put me off the idea ( in favour of `asyncio.gather()` or even not trying for parallel execution at all):
- #1727","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1212823665,
https://github.com/simonw/datasette/issues/1727#issuecomment-1112668411,https://api.github.com/repos/simonw/datasette/issues/1727,1112668411,IC_kwDOBm6k_c5CUfj7,9599,2022-04-28T21:25:34Z,2022-04-28T21:25:44Z,OWNER,"The two most promising theories at the moment, from here and Twitter and the SQLite forum, are:
- SQLite is I/O bound - it generally only goes as fast as it can load data from disk. Multiple connections all competing for the same file on disk are going to end up blocked at the file system layer. But maybe this means in-memory databases will perform better?
- It's the GIL. The sqlite3 C code may release the GIL, but the bits that do things like assembling `Row` objects to return still happen in Python, and that Python can only run on a single core.
A couple of ways to research the in-memory theory:
- Use a RAM disk on macOS (or Linux). https://stackoverflow.com/a/2033417/6083 has instructions - short version:
hdiutil attach -nomount ram://$((2 * 1024 * 100))
diskutil eraseVolume HFS+ RAMDisk name-returned-by-previous-command (was `/dev/disk2` when I tried it)
cd /Volumes/RAMDisk
cp ~/fixtures.db .
- Copy Datasette databases into an in-memory database on startup. I built a new plugin to do that here: https://github.com/simonw/datasette-copy-to-memory
I need to do some more, better benchmarks using these different approaches.
https://twitter.com/laurencerowe/status/1519780174560169987 also suggests:
> Maybe try:
> 1. Copy the sqlite file to /dev/shm and rerun (all in ram.)
> 2. Create a CTE which calculates Fibonacci or similar so you can test something completely cpu bound (only return max value or something to avoid crossing between sqlite/Python.)
I like that second idea a lot - I could use the mandelbrot example from https://www.sqlite.org/lang_with.html#outlandish_recursive_query_examples","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111726586,https://api.github.com/repos/simonw/datasette/issues/1727,1111726586,IC_kwDOBm6k_c5CQ5n6,9599,2022-04-28T04:17:16Z,2022-04-28T04:19:31Z,OWNER,"I could experiment with the `await asyncio.run_in_executor(processpool_executor, fn)` mechanism described in https://stackoverflow.com/a/29147750
Code examples: https://cs.github.com/?scopeName=All+repos&scope=&q=run_in_executor+ProcessPoolExecutor","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111725638,https://api.github.com/repos/simonw/datasette/issues/1727,1111725638,IC_kwDOBm6k_c5CQ5ZG,9599,2022-04-28T04:15:15Z,2022-04-28T04:15:15Z,OWNER,"Useful theory from Keith Medcalf https://sqlite.org/forum/forumpost/e363c69d3441172e
> This is true, but the concurrency is limited to the execution which occurs with the GIL released (that is, in the native C sqlite3 library itself). Each row (for example) can be retrieved in parallel but ""constructing the python return objects for each row"" will be serialized (by the GIL).
>
> That is to say that if your have two python threads each with their own connection, and each one is performing a select that returns 1,000,000 rows (lets say that is 25% of the candidates for each select) then the difference in execution time between executing two python threads in parallel vs a single serial thead will not be much different (if even detectable at all). In fact it is possible that the multiple-threaded version takes longer to run both queries to completion because of the increased contention over a shared resource (the GIL).
So maybe this is a GIL thing.
I should test with some expensive SQL queries (maybe big aggregations against large tables) and see if I can spot an improvement there.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1728#issuecomment-1111714665,https://api.github.com/repos/simonw/datasette/issues/1728,1111714665,IC_kwDOBm6k_c5CQ2tp,9599,2022-04-28T03:52:47Z,2022-04-28T03:52:58Z,OWNER,"Nice custom template/theme!
Yeah, for that I'd recommend hosting elsewhere - on a regular VPS (I use `systemd` like this: https://docs.datasette.io/en/stable/deploying.html#running-datasette-using-systemd ) or using Fly if you want to tub containers without managing a full server.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1218133366,
https://github.com/simonw/datasette/issues/1728#issuecomment-1111708206,https://api.github.com/repos/simonw/datasette/issues/1728,1111708206,IC_kwDOBm6k_c5CQ1Iu,9599,2022-04-28T03:38:56Z,2022-04-28T03:38:56Z,OWNER,"In terms of this bug, there are a few potential fixes:
1. Detect the write to a immutable database and show the user a proper, meaningful error message in the red error box at the top of the page
2. Don't allow the user to even submit the form - show a message saying that this canned query is unavailable because the database cannot be written to
3. Don't even allow Datasette to start running at all - if there's a canned query configured in `metadata.yml` and the database it refers to is in `-i` immutable mode throw an error on startup
I'm not keen on that last one because it would be frustrating if you couldn't launch Datasette just because you had an old canned query lying around in your metadata file.
So I'm leaning towards option 2.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1218133366,
https://github.com/simonw/datasette/issues/1728#issuecomment-1111707384,https://api.github.com/repos/simonw/datasette/issues/1728,1111707384,IC_kwDOBm6k_c5CQ074,9599,2022-04-28T03:36:46Z,2022-04-28T03:36:56Z,OWNER,"A more realistic solution (which I've been using on several of my own projects) is to keep the data itself in GitHub and encourage users to edit it there - using the GitHub web interface to edit YAML files or similar.
Needs your users to be comfortable hand-editing YAML though! You can at least guard against critical errors by having CI run tests against their YAML before deploying.
I have a dream of building a more friendly web forms interface which edits the YAML back on GitHub for the user, but that's just a concept at the moment.
Even more fun would be if a user-friendly form could submit PRs for review without the user having to know what a PR is!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1218133366,
https://github.com/simonw/datasette/issues/1728#issuecomment-1111706519,https://api.github.com/repos/simonw/datasette/issues/1728,1111706519,IC_kwDOBm6k_c5CQ0uX,9599,2022-04-28T03:34:49Z,2022-04-28T03:34:49Z,OWNER,"I've wanted to do stuff like that on Cloud Run too. So far I've assumed that it's not feasible, but recently I've been wondering how hard it would be to have a small (like less than 100KB or so) Datasette instance which persists data to a backing GitHub repository such that when it starts up it can pull the latest copy and any time someone edits it can push their changes.
I'm still not sure it would work well on Cloud Run due to the uncertainty at what would happen if Cloud Run decided to boot up a second instance - but it's still an interesting thought exercise.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1218133366,
https://github.com/simonw/datasette/issues/1728#issuecomment-1111705069,https://api.github.com/repos/simonw/datasette/issues/1728,1111705069,IC_kwDOBm6k_c5CQ0Xt,9599,2022-04-28T03:31:33Z,2022-04-28T03:31:33Z,OWNER,"Confirmed - this is a bug where immutable databases fail to show a useful error if you write to them with a canned query.
Steps to reproduce:
```
echo '
databases:
writable:
queries:
add_name:
sql: insert into names(name) values (:name)
write: true
' > write-metadata.yml
echo '{""name"": ""Simon""}' | sqlite-utils insert writable.db names -
datasette writable.db -m write-metadata.yml
```
Then visit http://127.0.0.1:8001/writable/add_name - adding names works.
Now do this instead:
```
datasette -i writable.db -m write-metadata.yml
```
And I'm getting a broken error:
![error](https://user-images.githubusercontent.com/9599/165670823-6604dd69-9905-475c-8098-5da22ab026a1.gif)
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1218133366,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111699175,https://api.github.com/repos/simonw/datasette/issues/1727,1111699175,IC_kwDOBm6k_c5CQy7n,9599,2022-04-28T03:19:48Z,2022-04-28T03:20:08Z,OWNER,"I ran `py-spy` and then hammered refresh a bunch of times on the `http://127.0.0.1:8856/github/commits?_facet=repo&_facet=committer&_trace=1&_noparallel=` page - it generated this SVG profile for me.
The area on the right is the threads running the DB queries:
![profile](https://user-images.githubusercontent.com/9599/165669677-5461ede5-3dc4-4b49-8319-bfe5fd8a723d.svg)
Interactive version here: https://static.simonwillison.net/static/2022/datasette-parallel-profile.svg","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1728#issuecomment-1111698307,https://api.github.com/repos/simonw/datasette/issues/1728,1111698307,IC_kwDOBm6k_c5CQyuD,9599,2022-04-28T03:18:02Z,2022-04-28T03:18:02Z,OWNER,If the behaviour you are seeing is because the database is running in immutable mode then that's a bug - you should get a useful error message instead!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1218133366,
https://github.com/simonw/datasette/issues/1728#issuecomment-1111697985,https://api.github.com/repos/simonw/datasette/issues/1728,1111697985,IC_kwDOBm6k_c5CQypB,9599,2022-04-28T03:17:20Z,2022-04-28T03:17:20Z,OWNER,"How did you deploy to Cloud Run?
`datasette publish cloudrun` defaults to running databases there in `-i` immutable mode, because if you managed to change a file on disk on Cloud Run those changes would be lost the next time your container restarted there.
That's why I upgraded `datasette-publish-fly` to provide a way of working with their volumes support - they're the best option I know of right now for running Datasette in a container with a persistent volume that can accept writes: https://simonwillison.net/2022/Feb/15/fly-volumes/","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1218133366,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111683539,https://api.github.com/repos/simonw/datasette/issues/1727,1111683539,IC_kwDOBm6k_c5CQvHT,9599,2022-04-28T02:47:57Z,2022-04-28T02:47:57Z,OWNER,"Maybe this is the Python GIL after all?
I've been hoping that the GIL won't be an issue because the `sqlite3` module releases the GIL for the duration of the execution of a SQL query - see https://github.com/python/cpython/blob/f348154c8f8a9c254503306c59d6779d4d09b3a9/Modules/_sqlite/cursor.c#L749-L759
So I've been hoping this means that SQLite code itself can run concurrently on multiple cores even when Python threads cannot.
But maybe I'm misunderstanding how that works?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111681513,https://api.github.com/repos/simonw/datasette/issues/1727,1111681513,IC_kwDOBm6k_c5CQunp,9599,2022-04-28T02:44:26Z,2022-04-28T02:44:26Z,OWNER,"I could try `py-spy top`, which I previously used here:
- https://github.com/simonw/datasette/issues/1673","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111661331,https://api.github.com/repos/simonw/datasette/issues/1727,1111661331,IC_kwDOBm6k_c5CQpsT,9599,2022-04-28T02:07:31Z,2022-04-28T02:07:31Z,OWNER,Asked on the SQLite forum about this here: https://sqlite.org/forum/forumpost/ffbfa9f38e,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111602802,https://api.github.com/repos/simonw/datasette/issues/1727,1111602802,IC_kwDOBm6k_c5CQbZy,9599,2022-04-28T00:21:35Z,2022-04-28T00:21:35Z,OWNER,"Tried this but I'm getting back an empty JSON array of traces at the bottom of the page most of the time (intermittently it works correctly):
```diff
diff --git a/datasette/database.py b/datasette/database.py
index ba594a8..d7f9172 100644
--- a/datasette/database.py
+++ b/datasette/database.py
@@ -7,7 +7,7 @@ import sys
import threading
import uuid
-from .tracer import trace
+from .tracer import trace, trace_child_tasks
from .utils import (
detect_fts,
detect_primary_keys,
@@ -207,30 +207,31 @@ class Database:
time_limit_ms = custom_time_limit
with sqlite_timelimit(conn, time_limit_ms):
- try:
- cursor = conn.cursor()
- cursor.execute(sql, params if params is not None else {})
- max_returned_rows = self.ds.max_returned_rows
- if max_returned_rows == page_size:
- max_returned_rows += 1
- if max_returned_rows and truncate:
- rows = cursor.fetchmany(max_returned_rows + 1)
- truncated = len(rows) > max_returned_rows
- rows = rows[:max_returned_rows]
- else:
- rows = cursor.fetchall()
- truncated = False
- except (sqlite3.OperationalError, sqlite3.DatabaseError) as e:
- if e.args == (""interrupted"",):
- raise QueryInterrupted(e, sql, params)
- if log_sql_errors:
- sys.stderr.write(
- ""ERROR: conn={}, sql = {}, params = {}: {}\n"".format(
- conn, repr(sql), params, e
+ with trace(""sql"", database=self.name, sql=sql.strip(), params=params):
+ try:
+ cursor = conn.cursor()
+ cursor.execute(sql, params if params is not None else {})
+ max_returned_rows = self.ds.max_returned_rows
+ if max_returned_rows == page_size:
+ max_returned_rows += 1
+ if max_returned_rows and truncate:
+ rows = cursor.fetchmany(max_returned_rows + 1)
+ truncated = len(rows) > max_returned_rows
+ rows = rows[:max_returned_rows]
+ else:
+ rows = cursor.fetchall()
+ truncated = False
+ except (sqlite3.OperationalError, sqlite3.DatabaseError) as e:
+ if e.args == (""interrupted"",):
+ raise QueryInterrupted(e, sql, params)
+ if log_sql_errors:
+ sys.stderr.write(
+ ""ERROR: conn={}, sql = {}, params = {}: {}\n"".format(
+ conn, repr(sql), params, e
+ )
)
- )
- sys.stderr.flush()
- raise
+ sys.stderr.flush()
+ raise
if truncate:
return Results(rows, truncated, cursor.description)
@@ -238,9 +239,8 @@ class Database:
else:
return Results(rows, False, cursor.description)
- with trace(""sql"", database=self.name, sql=sql.strip(), params=params):
- results = await self.execute_fn(sql_operation_in_thread)
- return results
+ with trace_child_tasks():
+ return await self.execute_fn(sql_operation_in_thread)
@property
def size(self):
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111597176,https://api.github.com/repos/simonw/datasette/issues/1727,1111597176,IC_kwDOBm6k_c5CQaB4,9599,2022-04-28T00:11:44Z,2022-04-28T00:11:44Z,OWNER,"Though it would be interesting to also have the trace reveal how much time is spent in the functions that wrap that core SQL - the stuff that is being measured at the moment.
I have a hunch that this could help solve the over-arching performance mystery.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111595319,https://api.github.com/repos/simonw/datasette/issues/1727,1111595319,IC_kwDOBm6k_c5CQZk3,9599,2022-04-28T00:09:45Z,2022-04-28T00:11:01Z,OWNER,"Here's where read queries are instrumented: https://github.com/simonw/datasette/blob/7a6654a253dee243518dc542ce4c06dbb0d0801d/datasette/database.py#L241-L242
So the instrumentation is actually capturing quite a bit of Python activity before it gets to SQLite:
https://github.com/simonw/datasette/blob/7a6654a253dee243518dc542ce4c06dbb0d0801d/datasette/database.py#L179-L190
And then:
https://github.com/simonw/datasette/blob/7a6654a253dee243518dc542ce4c06dbb0d0801d/datasette/database.py#L204-L233
Ideally I'd like that `trace()` block to wrap just the `cursor.execute()` and `cursor.fetchmany(...)` or `cursor.fetchall()` calls.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111558204,https://api.github.com/repos/simonw/datasette/issues/1727,1111558204,IC_kwDOBm6k_c5CQQg8,9599,2022-04-27T22:58:39Z,2022-04-27T22:58:39Z,OWNER,"I should check my timing mechanism. Am I capturing the time taken just in SQLite or does it include time spent in Python crossing between async and threaded world and waiting for a thread pool worker to become available?
That could explain the longer query times.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111553029,https://api.github.com/repos/simonw/datasette/issues/1727,1111553029,IC_kwDOBm6k_c5CQPQF,9599,2022-04-27T22:48:21Z,2022-04-27T22:48:21Z,OWNER,I wonder if it would be worth exploring multiprocessing here.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111551076,https://api.github.com/repos/simonw/datasette/issues/1727,1111551076,IC_kwDOBm6k_c5CQOxk,9599,2022-04-27T22:44:51Z,2022-04-27T22:45:04Z,OWNER,Really wild idea: what if I created three copies of the SQLite database file - as three separate file names - and then balanced the parallel queries across all these? Any chance that could avoid any mysterious locking issues?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111535818,https://api.github.com/repos/simonw/datasette/issues/1727,1111535818,IC_kwDOBm6k_c5CQLDK,9599,2022-04-27T22:18:45Z,2022-04-27T22:18:45Z,OWNER,"Another avenue: https://twitter.com/weargoggles/status/1519426289920270337
> SQLite has its own mutexes to provide thread safety, which as another poster noted are out of play in multi process setups. Perhaps downgrading from the “serializable” to “multi-threaded” safety would be okay for Datasette? https://sqlite.org/c3ref/c_config_covering_index_scan.html#sqliteconfigmultithread
Doesn't look like there's an obvious way to access that from Python via the `sqlite3` module though.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111485722,https://api.github.com/repos/simonw/datasette/issues/1727,1111485722,IC_kwDOBm6k_c5CP-0a,9599,2022-04-27T21:08:20Z,2022-04-27T21:08:20Z,OWNER,"Tried that and it didn't seem to make a difference either.
I really need a much deeper view of what's going on here.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111462442,https://api.github.com/repos/simonw/datasette/issues/1727,1111462442,IC_kwDOBm6k_c5CP5Iq,9599,2022-04-27T20:40:59Z,2022-04-27T20:42:49Z,OWNER,"This looks VERY relevant: [SQLite Shared-Cache Mode](https://www.sqlite.org/sharedcache.html):
> SQLite includes a special ""shared-cache"" mode (disabled by default) intended for use in embedded servers. If shared-cache mode is enabled and a thread establishes multiple connections to the same database, the connections share a single data and schema cache. This can significantly reduce the quantity of memory and IO required by the system.
Enabled as part of the URI filename:
ATTACH 'file:aux.db?cache=shared' AS aux;
Turns out I'm already using this for in-memory databases that have `.memory_name` set, but not (yet) for regular file-backed databases:
https://github.com/simonw/datasette/blob/7a6654a253dee243518dc542ce4c06dbb0d0801d/datasette/database.py#L73-L75
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111460068,https://api.github.com/repos/simonw/datasette/issues/1727,1111460068,IC_kwDOBm6k_c5CP4jk,9599,2022-04-27T20:38:32Z,2022-04-27T20:38:32Z,OWNER,WAL mode didn't seem to make a difference. I thought there was a chance it might help multiple read connections operate at the same time but it looks like it really does only matter for when writes are going on.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111456500,https://api.github.com/repos/simonw/datasette/issues/1727,1111456500,IC_kwDOBm6k_c5CP3r0,9599,2022-04-27T20:36:01Z,2022-04-27T20:36:01Z,OWNER,"Yeah all of this is pretty much assuming read-only connections. Datasette has a separate mechanism for ensuring that writes are executed one at a time against a dedicated connection from an in-memory queue:
- https://github.com/simonw/datasette/issues/682","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111442012,https://api.github.com/repos/simonw/datasette/issues/1727,1111442012,IC_kwDOBm6k_c5CP0Jc,9599,2022-04-27T20:19:00Z,2022-04-27T20:19:00Z,OWNER,"Something worth digging into: are these parallel queries running against the same SQLite connection or are they each rubbing against a separate SQLite connection?
Just realized I know the answer: they're running against separate SQLite connections, because that's how the time limit mechanism works: it installs a progress handler for each connection which terminates it after a set time.
This means that if SQLite benefits from multiple threads using the same connection (due to shared caches or similar) then Datasette will not be seeing those benefits.
It also means that if there's some mechanism within SQLite that penalizes you for having multiple parallel connections to a single file (just guessing here, maybe there's some kind of locking going on?) then Datasette will suffer those penalties.
I should try seeing what happens with WAL mode enabled.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111432375,https://api.github.com/repos/simonw/datasette/issues/1727,1111432375,IC_kwDOBm6k_c5CPxy3,9599,2022-04-27T20:07:57Z,2022-04-27T20:07:57Z,OWNER,Also useful: https://avi.im/blag/2021/fast-sqlite-inserts/ - from a tip on Twitter: https://twitter.com/ricardoanderegg/status/1519402047556235264,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111431785,https://api.github.com/repos/simonw/datasette/issues/1727,1111431785,IC_kwDOBm6k_c5CPxpp,9599,2022-04-27T20:07:16Z,2022-04-27T20:07:16Z,OWNER,"I think I need some much more in-depth tracing tricks for this.
https://www.maartenbreddels.com/perf/jupyter/python/tracing/gil/2021/01/14/Tracing-the-Python-GIL.html looks relevant - uses the `perf` tool on Linux.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111408273,https://api.github.com/repos/simonw/datasette/issues/1727,1111408273,IC_kwDOBm6k_c5CPr6R,9599,2022-04-27T19:40:51Z,2022-04-27T19:42:17Z,OWNER,"Relevant: here's the code that sets up a Datasette SQLite connection: https://github.com/simonw/datasette/blob/7a6654a253dee243518dc542ce4c06dbb0d0801d/datasette/database.py#L73-L96
It's using `check_same_thread=False` - here's [the Python docs on that](https://docs.python.org/3/library/sqlite3.html#sqlite3.connect):
> By default, *check_same_thread* is [`True`](https://docs.python.org/3/library/constants.html#True ""True"") and only the creating thread may use the connection. If set [`False`](https://docs.python.org/3/library/constants.html#False ""False""), the returned connection may be shared across multiple threads. When using multiple threads with the same connection writing operations should be serialized by the user to avoid data corruption.
This is why Datasette reserves a single connection for write queries and queues them up in memory, [as described here](https://simonwillison.net/2020/Feb/26/weeknotes-datasette-writes/).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111390433,https://api.github.com/repos/simonw/datasette/issues/1727,1111390433,IC_kwDOBm6k_c5CPnjh,9599,2022-04-27T19:21:02Z,2022-04-27T19:21:02Z,OWNER,"One weird thing: I noticed that in the parallel trace above the SQL query bars are wider. Mousover shows duration in ms, and I got 13ms for this query:
select message as value, count(*) as n from (
But in the `?_noparallel=1` version that some query took 2.97ms.
Given those numbers though I would expect the overall page time to be MUCH worse for the parallel version - but the page load times are instead very close to each other, with parallel often winning.
This is super-weird.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111385875,https://api.github.com/repos/simonw/datasette/issues/1727,1111385875,IC_kwDOBm6k_c5CPmcT,9599,2022-04-27T19:16:57Z,2022-04-27T19:16:57Z,OWNER,"I just remembered the `--setting num_sql_threads` option... which defaults to 3! https://github.com/simonw/datasette/blob/942411ef946e9a34a2094944d3423cddad27efd3/datasette/app.py#L109-L113
Would explain why the first trace never seems to show more than three SQL queries executing at once.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1727#issuecomment-1111380282,https://api.github.com/repos/simonw/datasette/issues/1727,1111380282,IC_kwDOBm6k_c5CPlE6,9599,2022-04-27T19:10:27Z,2022-04-27T19:10:27Z,OWNER,"Wrote more about that here: https://simonwillison.net/2022/Apr/27/parallel-queries/
Compare https://latest-with-plugins.datasette.io/github/commits?_facet=repo&_facet=committer&_trace=1
![image](https://user-images.githubusercontent.com/9599/165601503-2083c5d2-d740-405c-b34d-85570744ca82.png)
With the same thing but with parallel execution disabled:
https://latest-with-plugins.datasette.io/github/commits?_facet=repo&_facet=committer&_trace=1&_noparallel=1
![image](https://user-images.githubusercontent.com/9599/165601525-98abbfb1-5631-4040-b6bd-700948d1db6e.png)
Those total page load time numbers are very similar. Is this parallel optimization worthwhile?
Maybe it's only worth it on larger databases? Or maybe larger databases perform worse with this?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1217759117,
https://github.com/simonw/datasette/issues/1724#issuecomment-1110585475,https://api.github.com/repos/simonw/datasette/issues/1724,1110585475,IC_kwDOBm6k_c5CMjCD,9599,2022-04-27T06:15:14Z,2022-04-27T06:15:14Z,OWNER,"Yeah, that page is 438K (but only 20K gzipped).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1216619276,
https://github.com/simonw/datasette/issues/1724#issuecomment-1110370095,https://api.github.com/repos/simonw/datasette/issues/1724,1110370095,IC_kwDOBm6k_c5CLucv,9599,2022-04-27T00:18:30Z,2022-04-27T00:18:30Z,OWNER,"So this isn't a bug here, it's working as intended.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1216619276,
https://github.com/simonw/datasette/issues/1724#issuecomment-1110369004,https://api.github.com/repos/simonw/datasette/issues/1724,1110369004,IC_kwDOBm6k_c5CLuLs,9599,2022-04-27T00:16:35Z,2022-04-27T00:17:04Z,OWNER,"I bet this is because it's exceeding the size limit: https://github.com/simonw/datasette/blob/da53e0360da4771ffb56a8e3eb3f7476f3168299/datasette/tracer.py#L80-L88
https://github.com/simonw/datasette/blob/da53e0360da4771ffb56a8e3eb3f7476f3168299/datasette/tracer.py#L102-L113","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1216619276,
https://github.com/simonw/datasette/issues/1723#issuecomment-1110330554,https://api.github.com/repos/simonw/datasette/issues/1723,1110330554,IC_kwDOBm6k_c5CLky6,9599,2022-04-26T23:06:20Z,2022-04-26T23:06:20Z,OWNER,Deployed here: https://latest-with-plugins.datasette.io/github/commits?_facet=repo&_trace=1&_facet=committer,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1216508080,
https://github.com/simonw/datasette/issues/1723#issuecomment-1110305790,https://api.github.com/repos/simonw/datasette/issues/1723,1110305790,IC_kwDOBm6k_c5CLev-,9599,2022-04-26T22:19:04Z,2022-04-26T22:19:04Z,OWNER,"I realized that seeing the total time in queries wasn't enough to understand this, because if the queries were executed in serial or parallel it should still sum up to the same amount of SQL time (roughly).
Instead I need to know how long the page took to render. But that's hard to display on the page since you can't measure it until rendering has finished!
So I built an ASGI plugin to handle that measurement: https://github.com/simonw/datasette-total-page-time
And with that plugin installed, `http://127.0.0.1:8001/global-power-plants/global-power-plants?_facet=primary_fuel&_facet=other_fuel2&_facet=other_fuel1&_parallel=1` (the parallel version) takes 377ms:
While `http://127.0.0.1:8001/global-power-plants/global-power-plants?_facet=primary_fuel&_facet=other_fuel2&_facet=other_fuel1` (the serial version) takes 762ms:
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1216508080,
https://github.com/simonw/datasette/issues/1723#issuecomment-1110279869,https://api.github.com/repos/simonw/datasette/issues/1723,1110279869,IC_kwDOBm6k_c5CLYa9,9599,2022-04-26T21:45:39Z,2022-04-26T21:45:39Z,OWNER,"Getting some nice traces out of this:
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1216508080,
https://github.com/simonw/datasette/issues/1723#issuecomment-1110278577,https://api.github.com/repos/simonw/datasette/issues/1723,1110278577,IC_kwDOBm6k_c5CLYGx,9599,2022-04-26T21:44:04Z,2022-04-26T21:44:04Z,OWNER,"And some simple benchmarks with `ab` - using the `?_parallel=1` hack to try it with and without a parallel `asyncio.gather()`:
```
~ % ab -n 100 'http://127.0.0.1:8001/global-power-plants/global-power-plants?_facet=primary_fuel&_facet=other_fuel1&_facet=other_fuel3&_facet=other_fuel2'
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient).....done
Server Software: uvicorn
Server Hostname: 127.0.0.1
Server Port: 8001
Document Path: /global-power-plants/global-power-plants?_facet=primary_fuel&_facet=other_fuel1&_facet=other_fuel3&_facet=other_fuel2
Document Length: 314187 bytes
Concurrency Level: 1
Time taken for tests: 68.279 seconds
Complete requests: 100
Failed requests: 13
(Connect: 0, Receive: 0, Length: 13, Exceptions: 0)
Total transferred: 31454937 bytes
HTML transferred: 31418437 bytes
Requests per second: 1.46 [#/sec] (mean)
Time per request: 682.787 [ms] (mean)
Time per request: 682.787 [ms] (mean, across all concurrent requests)
Transfer rate: 449.89 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 621 683 68.0 658 993
Waiting: 620 682 68.0 657 992
Total: 621 683 68.0 658 993
Percentage of the requests served within a certain time (ms)
50% 658
66% 678
75% 687
80% 711
90% 763
95% 879
98% 926
99% 993
100% 993 (longest request)
----
In parallel:
~ % ab -n 100 'http://127.0.0.1:8001/global-power-plants/global-power-plants?_facet=primary_fuel&_facet=other_fuel1&_facet=other_fuel3&_facet=other_fuel2&_parallel=1'
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient).....done
Server Software: uvicorn
Server Hostname: 127.0.0.1
Server Port: 8001
Document Path: /global-power-plants/global-power-plants?_facet=primary_fuel&_facet=other_fuel1&_facet=other_fuel3&_facet=other_fuel2&_parallel=1
Document Length: 315703 bytes
Concurrency Level: 1
Time taken for tests: 34.763 seconds
Complete requests: 100
Failed requests: 11
(Connect: 0, Receive: 0, Length: 11, Exceptions: 0)
Total transferred: 31607988 bytes
HTML transferred: 31570288 bytes
Requests per second: 2.88 [#/sec] (mean)
Time per request: 347.632 [ms] (mean)
Time per request: 347.632 [ms] (mean, across all concurrent requests)
Transfer rate: 887.93 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 311 347 28.0 338 450
Waiting: 311 347 28.0 338 450
Total: 312 348 28.0 338 451
Percentage of the requests served within a certain time (ms)
50% 338
66% 348
75% 361
80% 367
90% 396
95% 408
98% 436
99% 451
100% 451 (longest request)
----
With concurrency 10, not parallel:
~ % ab -c 10 -n 100 'http://127.0.0.1:8001/global-power-plants/global-power-plants?_facet=primary_fuel&_facet=other_fuel1&_facet=other_fuel3&_facet=other_fuel2&_parallel='
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient).....done
Server Software: uvicorn
Server Hostname: 127.0.0.1
Server Port: 8001
Document Path: /global-power-plants/global-power-plants?_facet=primary_fuel&_facet=other_fuel1&_facet=other_fuel3&_facet=other_fuel2&_parallel=
Document Length: 314346 bytes
Concurrency Level: 10
Time taken for tests: 38.408 seconds
Complete requests: 100
Failed requests: 93
(Connect: 0, Receive: 0, Length: 93, Exceptions: 0)
Total transferred: 31471333 bytes
HTML transferred: 31433733 bytes
Requests per second: 2.60 [#/sec] (mean)
Time per request: 3840.829 [ms] (mean)
Time per request: 384.083 [ms] (mean, across all concurrent requests)
Transfer rate: 800.18 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 685 3719 354.0 3774 4096
Waiting: 684 3707 353.7 3750 4095
Total: 685 3719 354.0 3774 4096
Percentage of the requests served within a certain time (ms)
50% 3774
66% 3832
75% 3855
80% 3878
90% 3944
95% 4006
98% 4057
99% 4096
100% 4096 (longest request)
----
Concurrency 10 parallel:
~ % ab -c 10 -n 100 'http://127.0.0.1:8001/global-power-plants/global-power-plants?_facet=primary_fuel&_facet=other_fuel1&_facet=other_fuel3&_facet=other_fuel2&_parallel=1'
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient).....done
Server Software: uvicorn
Server Hostname: 127.0.0.1
Server Port: 8001
Document Path: /global-power-plants/global-power-plants?_facet=primary_fuel&_facet=other_fuel1&_facet=other_fuel3&_facet=other_fuel2&_parallel=1
Document Length: 315703 bytes
Concurrency Level: 10
Time taken for tests: 36.762 seconds
Complete requests: 100
Failed requests: 89
(Connect: 0, Receive: 0, Length: 89, Exceptions: 0)
Total transferred: 31606516 bytes
HTML transferred: 31568816 bytes
Requests per second: 2.72 [#/sec] (mean)
Time per request: 3676.182 [ms] (mean)
Time per request: 367.618 [ms] (mean, across all concurrent requests)
Transfer rate: 839.61 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 0
Processing: 381 3602 419.6 3609 4458
Waiting: 381 3586 418.7 3607 4457
Total: 381 3603 419.6 3609 4458
Percentage of the requests served within a certain time (ms)
50% 3609
66% 3741
75% 3791
80% 3821
90% 3972
95% 4074
98% 4386
99% 4458
100% 4458 (longest request)
Trying -c 3 instead. Non parallel:
~ % ab -c 3 -n 100 'http://127.0.0.1:8001/global-power-plants/global-power-plants?_facet=primary_fuel&_facet=other_fuel1&_facet=other_fuel3&_facet=other_fuel2&_parallel='
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient).....done
Server Software: uvicorn
Server Hostname: 127.0.0.1
Server Port: 8001
Document Path: /global-power-plants/global-power-plants?_facet=primary_fuel&_facet=other_fuel1&_facet=other_fuel3&_facet=other_fuel2&_parallel=
Document Length: 314346 bytes
Concurrency Level: 3
Time taken for tests: 39.365 seconds
Complete requests: 100
Failed requests: 83
(Connect: 0, Receive: 0, Length: 83, Exceptions: 0)
Total transferred: 31470808 bytes
HTML transferred: 31433208 bytes
Requests per second: 2.54 [#/sec] (mean)
Time per request: 1180.955 [ms] (mean)
Time per request: 393.652 [ms] (mean, across all concurrent requests)
Transfer rate: 780.72 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 731 1153 126.2 1189 1359
Waiting: 730 1151 125.9 1188 1358
Total: 731 1153 126.2 1189 1359
Percentage of the requests served within a certain time (ms)
50% 1189
66% 1221
75% 1234
80% 1247
90% 1296
95% 1309
98% 1343
99% 1359
100% 1359 (longest request)
----
Parallel:
~ % ab -c 3 -n 100 'http://127.0.0.1:8001/global-power-plants/global-power-plants?_facet=primary_fuel&_facet=other_fuel1&_facet=other_fuel3&_facet=other_fuel2&_parallel=1'
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient).....done
Server Software: uvicorn
Server Hostname: 127.0.0.1
Server Port: 8001
Document Path: /global-power-plants/global-power-plants?_facet=primary_fuel&_facet=other_fuel1&_facet=other_fuel3&_facet=other_fuel2&_parallel=1
Document Length: 315703 bytes
Concurrency Level: 3
Time taken for tests: 34.530 seconds
Complete requests: 100
Failed requests: 18
(Connect: 0, Receive: 0, Length: 18, Exceptions: 0)
Total transferred: 31606179 bytes
HTML transferred: 31568479 bytes
Requests per second: 2.90 [#/sec] (mean)
Time per request: 1035.902 [ms] (mean)
Time per request: 345.301 [ms] (mean, across all concurrent requests)
Transfer rate: 893.87 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 412 1020 104.4 1018 1280
Waiting: 411 1018 104.1 1014 1275
Total: 412 1021 104.4 1018 1280
Percentage of the requests served within a certain time (ms)
50% 1018
66% 1041
75% 1061
80% 1079
90% 1136
95% 1176
98% 1251
99% 1280
100% 1280 (longest request)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1216508080,
https://github.com/simonw/datasette/issues/1723#issuecomment-1110278182,https://api.github.com/repos/simonw/datasette/issues/1723,1110278182,IC_kwDOBm6k_c5CLYAm,9599,2022-04-26T21:43:34Z,2022-04-26T21:43:34Z,OWNER,"Here's the diff I'm using:
```diff
diff --git a/datasette/views/table.py b/datasette/views/table.py
index d66adb8..f15ef1e 100644
--- a/datasette/views/table.py
+++ b/datasette/views/table.py
@@ -1,3 +1,4 @@
+import asyncio
import itertools
import json
@@ -5,6 +6,7 @@ import markupsafe
from datasette.plugins import pm
from datasette.database import QueryInterrupted
+from datasette import tracer
from datasette.utils import (
await_me_maybe,
CustomRow,
@@ -150,6 +152,16 @@ class TableView(DataView):
default_labels=False,
_next=None,
_size=None,
+ ):
+ with tracer.trace_child_tasks():
+ return await self._data_traced(request, default_labels, _next, _size)
+
+ async def _data_traced(
+ self,
+ request,
+ default_labels=False,
+ _next=None,
+ _size=None,
):
database_route = tilde_decode(request.url_vars[""database""])
table_name = tilde_decode(request.url_vars[""table""])
@@ -159,6 +171,20 @@ class TableView(DataView):
raise NotFound(""Database not found: {}"".format(database_route))
database_name = db.name
+ # For performance profiling purposes, ?_parallel=1 turns on asyncio.gather
+ async def _gather_parallel(*args):
+ return await asyncio.gather(*args)
+
+ async def _gather_sequential(*args):
+ results = []
+ for fn in args:
+ results.append(await fn)
+ return results
+
+ gather = (
+ _gather_parallel if request.args.get(""_parallel"") else _gather_sequential
+ )
+
# If this is a canned query, not a table, then dispatch to QueryView instead
canned_query = await self.ds.get_canned_query(
database_name, table_name, request.actor
@@ -174,8 +200,12 @@ class TableView(DataView):
write=bool(canned_query.get(""write"")),
)
- is_view = bool(await db.get_view_definition(table_name))
- table_exists = bool(await db.table_exists(table_name))
+ is_view, table_exists = map(
+ bool,
+ await gather(
+ db.get_view_definition(table_name), db.table_exists(table_name)
+ ),
+ )
# If table or view not found, return 404
if not is_view and not table_exists:
@@ -497,33 +527,44 @@ class TableView(DataView):
)
)
- if not nofacet:
- for facet in facet_instances:
- (
+ async def execute_facets():
+ if not nofacet:
+ # Run them in parallel
+ facet_awaitables = [facet.facet_results() for facet in facet_instances]
+ facet_awaitable_results = await gather(*facet_awaitables)
+ for (
instance_facet_results,
instance_facets_timed_out,
- ) = await facet.facet_results()
- for facet_info in instance_facet_results:
- base_key = facet_info[""name""]
- key = base_key
- i = 1
- while key in facet_results:
- i += 1
- key = f""{base_key}_{i}""
- facet_results[key] = facet_info
- facets_timed_out.extend(instance_facets_timed_out)
-
- # Calculate suggested facets
+ ) in facet_awaitable_results:
+ for facet_info in instance_facet_results:
+ base_key = facet_info[""name""]
+ key = base_key
+ i = 1
+ while key in facet_results:
+ i += 1
+ key = f""{base_key}_{i}""
+ facet_results[key] = facet_info
+ facets_timed_out.extend(instance_facets_timed_out)
+
suggested_facets = []
- if (
- self.ds.setting(""suggest_facets"")
- and self.ds.setting(""allow_facet"")
- and not _next
- and not nofacet
- and not nosuggest
- ):
- for facet in facet_instances:
- suggested_facets.extend(await facet.suggest())
+
+ async def execute_suggested_facets():
+ # Calculate suggested facets
+ if (
+ self.ds.setting(""suggest_facets"")
+ and self.ds.setting(""allow_facet"")
+ and not _next
+ and not nofacet
+ and not nosuggest
+ ):
+ # Run them in parallel
+ facet_suggest_awaitables = [
+ facet.suggest() for facet in facet_instances
+ ]
+ for suggest_result in await gather(*facet_suggest_awaitables):
+ suggested_facets.extend(suggest_result)
+
+ await gather(execute_facets(), execute_suggested_facets())
# Figure out columns and rows for the query
columns = [r[0] for r in results.description]
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1216508080,
https://github.com/simonw/datasette/issues/1715#issuecomment-1110265087,https://api.github.com/repos/simonw/datasette/issues/1715,1110265087,IC_kwDOBm6k_c5CLUz_,9599,2022-04-26T21:26:17Z,2022-04-26T21:26:17Z,OWNER,"Running facets and facet suggestions in parallel using `asyncio.gather()` turns out to be a lot less hassle than I had thought - maybe I don't need `asyncinject` for this at all?
```diff
if not nofacet:
- for facet in facet_instances:
- (
- instance_facet_results,
- instance_facets_timed_out,
- ) = await facet.facet_results()
+ # Run them in parallel
+ facet_awaitables = [facet.facet_results() for facet in facet_instances]
+ facet_awaitable_results = await asyncio.gather(*facet_awaitables)
+ for (
+ instance_facet_results,
+ instance_facets_timed_out,
+ ) in facet_awaitable_results:
for facet_info in instance_facet_results:
base_key = facet_info[""name""]
key = base_key
@@ -522,8 +540,10 @@ class TableView(DataView):
and not nofacet
and not nosuggest
):
- for facet in facet_instances:
- suggested_facets.extend(await facet.suggest())
+ # Run them in parallel
+ facet_suggest_awaitables = [facet.suggest() for facet in facet_instances]
+ for suggest_result in await asyncio.gather(*facet_suggest_awaitables):
+ suggested_facets.extend(suggest_result)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1212823665,
https://github.com/simonw/datasette/issues/1715#issuecomment-1110246593,https://api.github.com/repos/simonw/datasette/issues/1715,1110246593,IC_kwDOBm6k_c5CLQTB,9599,2022-04-26T21:03:56Z,2022-04-26T21:03:56Z,OWNER,"Well this is fun... I applied this change:
```diff
diff --git a/datasette/views/table.py b/datasette/views/table.py
index d66adb8..85f9e44 100644
--- a/datasette/views/table.py
+++ b/datasette/views/table.py
@@ -1,3 +1,4 @@
+import asyncio
import itertools
import json
@@ -5,6 +6,7 @@ import markupsafe
from datasette.plugins import pm
from datasette.database import QueryInterrupted
+from datasette import tracer
from datasette.utils import (
await_me_maybe,
CustomRow,
@@ -174,8 +176,11 @@ class TableView(DataView):
write=bool(canned_query.get(""write"")),
)
- is_view = bool(await db.get_view_definition(table_name))
- table_exists = bool(await db.table_exists(table_name))
+ with tracer.trace_child_tasks():
+ is_view, table_exists = map(bool, await asyncio.gather(
+ db.get_view_definition(table_name),
+ db.table_exists(table_name)
+ ))
# If table or view not found, return 404
if not is_view and not table_exists:
```
And now using https://datasette.io/plugins/datasette-pretty-traces I get this:
![CleanShot 2022-04-26 at 14 03 33@2x](https://user-images.githubusercontent.com/9599/165392009-84c4399d-3e94-46d4-ba7b-a64a116cac5c.png)
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1212823665,
https://github.com/simonw/datasette/issues/1715#issuecomment-1110219185,https://api.github.com/repos/simonw/datasette/issues/1715,1110219185,IC_kwDOBm6k_c5CLJmx,9599,2022-04-26T20:28:40Z,2022-04-26T20:56:48Z,OWNER,"The refactor I did in #1719 pretty much clashes with all of the changes in https://github.com/simonw/datasette/commit/5053f1ea83194ecb0a5693ad5dada5b25bf0f7e6 so I'll probably need to start my `api-extras` branch again from scratch.
Using a new `tableview-asyncinject` branch.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1212823665,
https://github.com/simonw/datasette/issues/1715#issuecomment-1110239536,https://api.github.com/repos/simonw/datasette/issues/1715,1110239536,IC_kwDOBm6k_c5CLOkw,9599,2022-04-26T20:54:53Z,2022-04-26T20:54:53Z,OWNER,`pytest tests/test_table_*` runs the tests quickly.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1212823665,
https://github.com/simonw/datasette/issues/1715#issuecomment-1110238896,https://api.github.com/repos/simonw/datasette/issues/1715,1110238896,IC_kwDOBm6k_c5CLOaw,9599,2022-04-26T20:53:59Z,2022-04-26T20:53:59Z,OWNER,I'm going to rename `database` to `database_name` and `table` to `table_name` to avoid confusion with the `Database` object as opposed to the string name for the database.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1212823665,
https://github.com/simonw/datasette/issues/1715#issuecomment-1110229319,https://api.github.com/repos/simonw/datasette/issues/1715,1110229319,IC_kwDOBm6k_c5CLMFH,9599,2022-04-26T20:41:32Z,2022-04-26T20:44:38Z,OWNER,"This time I'm not going to bother with the `filter_args` thing - I'm going to just try to use `asyncinject` to execute some big high level things in parallel - facets, suggested facets, counts, the query - and then combine it with the `extras` mechanism I'm trying to introduce too.
Most importantly: I want that `extra_template()` function that adds more template context for the HTML to be executed as part of an `asyncinject` flow!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1212823665,
https://github.com/simonw/datasette/issues/1720#issuecomment-1110212021,https://api.github.com/repos/simonw/datasette/issues/1720,1110212021,IC_kwDOBm6k_c5CLH21,9599,2022-04-26T20:20:27Z,2022-04-26T20:20:27Z,OWNER,Closing this because I have a good enough idea of the design for now - the details of the parameters can be figured out when I implement this.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1215174094,
https://github.com/simonw/datasette/issues/1720#issuecomment-1109309683,https://api.github.com/repos/simonw/datasette/issues/1720,1109309683,IC_kwDOBm6k_c5CHrjz,9599,2022-04-26T04:12:39Z,2022-04-26T04:12:39Z,OWNER,"I think the rough shape of the three plugin hooks is right. The detailed decisions that are needed concern what the parameters should be, which I think will mainly happen as part of:
- #1715","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1215174094,