6,033 rows sorted by updated_at descending

View and edit SQL

Suggested facets: reactions, created_at (date), updated_at (date)

issue

author_association

id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
885098025 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-885098025 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 IC_kwDODFE5qs40wYYp UtahDave 306240 2021-07-22T17:47:50Z 2021-07-22T17:47:50Z NONE

Hi @maxhawkins , I'm sorry, I haven't had any time to work on this. I'll have some time tomorrow to test your commits. I think they look great. I'm great with your commits superseding my initial attempt here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
885094284 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-885094284 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 IC_kwDODFE5qs40wXeM maxhawkins 28565 2021-07-22T17:41:32Z 2021-07-22T17:41:32Z NONE

I added a follow-up commit that deals with emails that don't have a Date header: https://github.com/maxhawkins/google-takeout-to-sqlite/commit/4bc70103582c10802c85a523ef1e99a8a2154aa9

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
885022230 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-885022230 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 IC_kwDODFE5qs40wF4W maxhawkins 28565 2021-07-22T15:51:46Z 2021-07-22T15:51:46Z NONE

One thing I noticed is this importer doesn't save attachments along with the body of the emails. It would be nice if those got stored as blobs in a separate attachments table so attachments can be included while fetching search results.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
884672647 https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-884672647 https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 IC_kwDODFE5qs40uwiH maxhawkins 28565 2021-07-22T05:56:31Z 2021-07-22T14:03:08Z NONE

How does this commit look? https://github.com/maxhawkins/google-takeout-to-sqlite/commit/72802a83fee282eb5d02d388567731ba4301050d

It seems that Takeout's mbox format is pretty simple, so we can get away with just splitting the file on lines begining with From. My commit just splits the file every time a line starts with From and uses email.message_from_bytes to parse each chunk.

I was able to load a 12GB takeout mbox without the program using more than a couple hundred MB of memory during the import process. It does make us lose the progress bar, but maybe I can add that back in a later commit.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
WIP: Add Gmail takeout mbox import 813880401  
884910320 https://github.com/simonw/datasette/issues/1401#issuecomment-884910320 https://api.github.com/repos/simonw/datasette/issues/1401 IC_kwDOBm6k_c40vqjw fgregg 536941 2021-07-22T13:26:01Z 2021-07-22T13:26:01Z NONE

ordered lists didn't work either, btw

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
unordered list is not rendering bullet points in description_html on database page 950664971  
884688833 https://github.com/dogsheep/dogsheep-photos/issues/32#issuecomment-884688833 https://api.github.com/repos/dogsheep/dogsheep-photos/issues/32 IC_kwDOD079W840u0fB aaronyih1 10793464 2021-07-22T06:40:25Z 2021-07-22T06:40:25Z NONE

The solution here is to upload an image to the bucket first. It is caused because it does not properly handle the case when there are no images in the bucket.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
KeyError: 'Contents' on running upload 803333769  
817403642 https://github.com/simonw/datasette/pull/1296#issuecomment-817403642 https://api.github.com/repos/simonw/datasette/issues/1296 MDEyOklzc3VlQ29tbWVudDgxNzQwMzY0Mg== codecov[bot] 22429695 2021-04-12T00:29:05Z 2021-07-20T08:52:12Z NONE

Codecov Report

Merging #1296 (527a056) into main (c73af5d) will decrease coverage by 0.11%.
The diff coverage is n/a.

:exclamation: Current head 527a056 differs from pull request most recent head 8f00c31. Consider uploading reports for the commit 8f00c31 to get more accurate results

@@            Coverage Diff             @@
##             main    #1296      +/-   ##
==========================================
- Coverage   91.62%   91.51%   -0.12%     
==========================================
  Files          34       34              
  Lines        4371     4255     -116     
==========================================
- Hits         4005     3894     -111     
+ Misses        366      361       -5     
<table> <thead> <tr> <th>Impacted Files</th> <th>Coverage Δ</th> <th></th> </tr> </thead> <tbody> <tr> <td>datasette/tracer.py</td> <td>81.60% <0.00%> (-1.35%)</td> <td>:arrow_down:</td> </tr> <tr> <td>datasette/views/base.py</td> <td>95.01% <0.00%> (-0.42%)</td> <td>:arrow_down:</td> </tr> <tr> <td>datasette/facets.py</td> <td>89.04% <0.00%> (-0.41%)</td> <td>:arrow_down:</td> </tr> <tr> <td>datasette/utils/__init__.py</td> <td>94.13% <0.00%> (-0.21%)</td> <td>:arrow_down:</td> </tr> <tr> <td>datasette/renderer.py</td> <td>94.02% <0.00%> (-0.18%)</td> <td>:arrow_down:</td> </tr> <tr> <td>datasette/views/database.py</td> <td>97.19% <0.00%> (-0.10%)</td> <td>:arrow_down:</td> </tr> <tr> <td>datasette/views/table.py</td> <td>95.88% <0.00%> (-0.07%)</td> <td>:arrow_down:</td> </tr> <tr> <td>datasette/views/index.py</td> <td>96.36% <0.00%> (-0.07%)</td> <td>:arrow_down:</td> </tr> <tr> <td>datasette/hookspecs.py</td> <td>100.00% <0.00%> (ø)</td> <td></td> </tr> <tr> <td>datasette/utils/testing.py</td> <td>95.38% <0.00%> (ø)</td> <td></td> </tr> <tr> <td>... and 5 more</td> <td></td> <td></td> </tr> </tbody> </table>

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c73af5d...8f00c31. Read the comment docs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Dockerfile: use Ubuntu 20.10 as base 855446829  
882542519 https://github.com/simonw/datasette/pull/1400#issuecomment-882542519 https://api.github.com/repos/simonw/datasette/issues/1400 IC_kwDOBm6k_c40moe3 codecov[bot] 22429695 2021-07-19T13:20:52Z 2021-07-19T13:20:52Z NONE

Codecov Report

Merging #1400 (e95c685) into main (c73af5d) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #1400   +/-   ##
=======================================
  Coverage   91.62%   91.62%           
=======================================
  Files          34       34           
  Lines        4371     4371           
=======================================
  Hits         4005     4005           
  Misses        366      366           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c73af5d...e95c685. Read the comment docs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Bump black from 21.6b0 to 21.7b0 947640902  
882138084 https://github.com/simonw/datasette/issues/123#issuecomment-882138084 https://api.github.com/repos/simonw/datasette/issues/123 IC_kwDOBm6k_c40lFvk simonw 9599 2021-07-19T00:04:31Z 2021-07-19T00:04:31Z OWNER

I've been thinking more about this one today too. An extension of this (touched on in #417, Datasette Library) would be to support pointing Datasette at a directory and having it automatically load any CSV files it finds anywhere in that folder or its descendants - either loading them fully, or providing a UI that allows users to select a file to open it in Datasette.

For larger files I think the right thing to do is import them into an on-disk SQLite database, which is limited only by available disk space. For smaller files loading them into an in-memory database should work fine.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Datasette serve should accept paths/URLs to CSVs and other file formats 275125561  
882096402 https://github.com/simonw/datasette/issues/123#issuecomment-882096402 https://api.github.com/repos/simonw/datasette/issues/123 IC_kwDOBm6k_c40k7kS RayBB 921217 2021-07-18T18:07:29Z 2021-07-18T18:07:29Z NONE

I also love the idea for this feature and wonder if it could work without having to download the whole database into memory at once if it's a rather large db. Obviously this could be slower but could have many use cases.

My comment is partially inspired by this post about streaming sqlite dbs from github pages or such
https://news.ycombinator.com/item?id=27016630

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Datasette serve should accept paths/URLs to CSVs and other file formats 275125561  
882091516 https://github.com/dogsheep/dogsheep-photos/issues/32#issuecomment-882091516 https://api.github.com/repos/dogsheep/dogsheep-photos/issues/32 IC_kwDOD079W840k6X8 aaronyih1 10793464 2021-07-18T17:29:39Z 2021-07-18T17:33:02Z NONE

Same here for US West (N. California) us-west-1. Running on Catalina.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
KeyError: 'Contents' on running upload 803333769  
882052852 https://github.com/simonw/sqlite-utils/issues/297#issuecomment-882052852 https://api.github.com/repos/simonw/sqlite-utils/issues/297 IC_kwDOCGYnMM40kw70 simonw 9599 2021-07-18T12:59:20Z 2021-07-18T12:59:20Z OWNER

I'm not too worried about sqlite-utils memory because if your data is large enough that you can benefit from this optimization you probably should use a real file as opposed to a disposable memory database when analyzing it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Option for importing CSV data using the SQLite .import mechanism 944846776  
882052693 https://github.com/simonw/sqlite-utils/issues/297#issuecomment-882052693 https://api.github.com/repos/simonw/sqlite-utils/issues/297 IC_kwDOCGYnMM40kw5V simonw 9599 2021-07-18T12:57:54Z 2021-07-18T12:57:54Z OWNER

Another implementation option would be to use the CSV virtual table mechanism. This could avoid shelling out to the sqlite3 binary, but requires solving the harder problem of compiling and distributing a loadable SQLite module: https://www.sqlite.org/csv.html

This would also help solve the challenge of making this optimization available to the sqlite-utils memory command. That command operates against an in-memory database so it's not obvious how it could shell out to a binary.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Option for importing CSV data using the SQLite .import mechanism 944846776  
881932880 https://github.com/simonw/datasette/issues/1199#issuecomment-881932880 https://api.github.com/repos/simonw/datasette/issues/1199 IC_kwDOBm6k_c40kTpQ simonw 9599 2021-07-17T17:39:17Z 2021-07-17T17:39:17Z OWNER

I asked about optimizing performance on the SQLite forum and this came up as a suggestion: https://sqlite.org/forum/forumpost/9a6b9ae8e2048c8b?t=c

I can start by trying this:

PRAGMA mmap_size=268435456;
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Experiment with PRAGMA mmap_size=N 792652391  
881686662 https://github.com/simonw/datasette/issues/1396#issuecomment-881686662 https://api.github.com/repos/simonw/datasette/issues/1396 IC_kwDOBm6k_c40jXiG simonw 9599 2021-07-16T20:02:44Z 2021-07-16T20:02:44Z OWNER

Confirmed fixed: 0.58.1 was successfully published to Docker Hub in https://github.com/simonw/datasette/runs/3089447346 and the latest tag on https://hub.docker.com/r/datasetteproject/datasette/tags was updated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"invalid reference format" publishing Docker image 944903881  
881677620 https://github.com/simonw/datasette/issues/1231#issuecomment-881677620 https://api.github.com/repos/simonw/datasette/issues/1231 IC_kwDOBm6k_c40jVU0 simonw 9599 2021-07-16T19:44:12Z 2021-07-16T19:44:12Z OWNER

That fixed the race condition in the datasette-graphql tests, which is the only place that I've been able to successfully replicate this. I'm going to land this change.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Race condition errors in new refresh_schemas() mechanism 811367257  
881674857 https://github.com/simonw/datasette/issues/1231#issuecomment-881674857 https://api.github.com/repos/simonw/datasette/issues/1231 IC_kwDOBm6k_c40jUpp simonw 9599 2021-07-16T19:38:39Z 2021-07-16T19:38:39Z OWNER

I can't replicate the race condition locally with or without this patch. I'm going to push the commit and then test the CI run from datasette-graphql that was failing against it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Race condition errors in new refresh_schemas() mechanism 811367257  
881671706 https://github.com/simonw/datasette/issues/1231#issuecomment-881671706 https://api.github.com/repos/simonw/datasette/issues/1231 IC_kwDOBm6k_c40jT4a simonw 9599 2021-07-16T19:32:05Z 2021-07-16T19:32:05Z OWNER

The test suite passes with that change.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Race condition errors in new refresh_schemas() mechanism 811367257  
881668759 https://github.com/simonw/datasette/issues/1231#issuecomment-881668759 https://api.github.com/repos/simonw/datasette/issues/1231 IC_kwDOBm6k_c40jTKX simonw 9599 2021-07-16T19:27:46Z 2021-07-16T19:27:46Z OWNER

Second attempt at this:

diff --git a/datasette/app.py b/datasette/app.py
index 5976d8b..5f348cb 100644
--- a/datasette/app.py
+++ b/datasette/app.py
@@ -224,6 +224,7 @@ class Datasette:
         self.inspect_data = inspect_data
         self.immutables = set(immutables or [])
         self.databases = collections.OrderedDict()
+        self._refresh_schemas_lock = asyncio.Lock()
         self.crossdb = crossdb
         if memory or crossdb or not self.files:
             self.add_database(Database(self, is_memory=True), name="_memory")
@@ -332,6 +333,12 @@ class Datasette:
         self.client = DatasetteClient(self)

     async def refresh_schemas(self):
+        if self._refresh_schemas_lock.locked():
+            return
+        async with self._refresh_schemas_lock:
+            await self._refresh_schemas()
+
+    async def _refresh_schemas(self):
         internal_db = self.databases["_internal"]
         if not self.internal_db_created:
             await init_internal_db(internal_db)
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Race condition errors in new refresh_schemas() mechanism 811367257  
881665383 https://github.com/simonw/datasette/issues/1231#issuecomment-881665383 https://api.github.com/repos/simonw/datasette/issues/1231 IC_kwDOBm6k_c40jSVn simonw 9599 2021-07-16T19:21:35Z 2021-07-16T19:21:35Z OWNER

https://stackoverflow.com/a/25799871/6083 has a good example of using asyncio.Lock():

stuff_lock = asyncio.Lock()

async def get_stuff(url):
    async with stuff_lock:
        if url in cache:
            return cache[url]
        stuff = await aiohttp.request('GET', url)
        cache[url] = stuff
        return stuff
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Race condition errors in new refresh_schemas() mechanism 811367257  
881664408 https://github.com/simonw/datasette/issues/1231#issuecomment-881664408 https://api.github.com/repos/simonw/datasette/issues/1231 IC_kwDOBm6k_c40jSGY simonw 9599 2021-07-16T19:19:35Z 2021-07-16T19:19:35Z OWNER

The only place that calls refresh_schemas() is here: https://github.com/simonw/datasette/blob/dd5ee8e66882c94343cd3f71920878c6cfd0da41/datasette/views/base.py#L120-L124

Ideally only one call to refresh_schemas() would be running at any one time.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Race condition errors in new refresh_schemas() mechanism 811367257  
881663968 https://github.com/simonw/datasette/issues/1231#issuecomment-881663968 https://api.github.com/repos/simonw/datasette/issues/1231 IC_kwDOBm6k_c40jR_g simonw 9599 2021-07-16T19:18:42Z 2021-07-16T19:18:42Z OWNER

The race condition happens inside this method - initially with the call to await init_internal_db(): https://github.com/simonw/datasette/blob/dd5ee8e66882c94343cd3f71920878c6cfd0da41/datasette/app.py#L334-L359

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Race condition errors in new refresh_schemas() mechanism 811367257  
881204782 https://github.com/simonw/datasette/issues/1231#issuecomment-881204782 https://api.github.com/repos/simonw/datasette/issues/1231 IC_kwDOBm6k_c40hh4u simonw 9599 2021-07-16T06:14:12Z 2021-07-16T06:14:12Z OWNER

Here's the traceback I got from datasette-graphql (annoyingly only running the tests in GitHub Actions CI - I've not been able to replicate on my laptop yet):

tests/test_utils.py .                                                    [100%]

=================================== FAILURES ===================================
_________________________ test_graphql_examples[path0] _________________________

ds = <datasette.app.Datasette object at 0x7f6b8b6f8fd0>
path = PosixPath('/home/runner/work/datasette-graphql/datasette-graphql/examples/filters.md')

    @pytest.mark.asyncio
    @pytest.mark.parametrize(
        "path", (pathlib.Path(__file__).parent.parent / "examples").glob("*.md")
    )
    async def test_graphql_examples(ds, path):
        content = path.read_text()
        query = graphql_re.search(content)[1]
        try:
            variables = variables_re.search(content)[1]
        except TypeError:
            variables = "{}"
        expected = json.loads(json_re.search(content)[1])
        response = await ds.client.post(
            "/graphql",
            json={
                "query": query,
                "variables": json.loads(variables),
            },
        )
>       assert response.status_code == 200, response.json()
E       AssertionError: {'data': {'repos_arraycontains': None, 'users_contains': None, 'users_date': None, 'users_endswith': None, ...}, 'erro...", 'path': ['users_gt']}, {'locations': [{'column': 5, 'line': 34}], 'message': "'rows'", 'path': ['users_gte']}, ...]}
E       assert 500 == 200
E        +  where 500 = <Response [500 Internal Server Error]>.status_code

tests/test_graphql.py:142: AssertionError
----------------------------- Captured stderr call -----------------------------
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
table databases already exists
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/datasette/app.py", line 1171, in route_path
    response = await view(request, send)
  File "/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/datasette/views/base.py", line 151, in view
    request, **request.scope["url_route"]["kwargs"]
  File "/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/datasette/views/base.py", line 123, in dispatch_request
    await self.ds.refresh_schemas()
  File "/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/datasette/app.py", line 338, in refresh_schemas
    await init_internal_db(internal_db)
  File "/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/datasette/utils/internal_db.py", line 16, in init_internal_db
    block=True,
  File "/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/datasette/database.py", line 102, in execute_write
    return await self.execute_write_fn(_inner, block=block)
  File "/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/datasette/database.py", line 118, in execute_write_fn
    raise result
  File "/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/datasette/database.py", line 139, in _execute_writes
    result = task.fn(conn)
  File "/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/datasette/database.py", line 100, in _inner
    return conn.execute(sql, params or [])
sqlite3.OperationalError: table databases already exists
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Race condition errors in new refresh_schemas() mechanism 811367257  
881204343 https://github.com/simonw/datasette/issues/1231#issuecomment-881204343 https://api.github.com/repos/simonw/datasette/issues/1231 IC_kwDOBm6k_c40hhx3 simonw 9599 2021-07-16T06:13:11Z 2021-07-16T06:13:11Z OWNER

This just broke the datasette-graphql test suite: https://github.com/simonw/datasette-graphql/issues/77 - I need to figure out a solution here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Race condition errors in new refresh_schemas() mechanism 811367257  
881129149 https://github.com/simonw/datasette/issues/1394#issuecomment-881129149 https://api.github.com/repos/simonw/datasette/issues/1394 IC_kwDOBm6k_c40hPa9 simonw 9599 2021-07-16T02:23:32Z 2021-07-16T02:23:32Z OWNER

Wrote about this in the annotated release notes for 0.58: https://simonwillison.net/2021/Jul/16/datasette-058/

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Big performance boost on faceting: skip the inner order by 944870799  
881125124 https://github.com/simonw/datasette/issues/759#issuecomment-881125124 https://api.github.com/repos/simonw/datasette/issues/759 IC_kwDOBm6k_c40hOcE simonw 9599 2021-07-16T02:11:48Z 2021-07-16T02:11:54Z OWNER

I added "searchmode": "raw" as a supported option for table metadata in #1389 and released that in Datasette 0.58.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
fts search on a column doesn't work anymore due to escape_fts 612673948  
880967052 https://github.com/simonw/datasette/issues/1396#issuecomment-880967052 https://api.github.com/repos/simonw/datasette/issues/1396 MDEyOklzc3VlQ29tbWVudDg4MDk2NzA1Mg== simonw 9599 2021-07-15T19:47:25Z 2021-07-15T19:47:25Z OWNER

Actually I'm going to close this now and re-open it if the problem occurs again in the future.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"invalid reference format" publishing Docker image 944903881  
880900534 https://github.com/simonw/datasette/issues/1394#issuecomment-880900534 https://api.github.com/repos/simonw/datasette/issues/1394 MDEyOklzc3VlQ29tbWVudDg4MDkwMDUzNA== simonw 9599 2021-07-15T17:58:03Z 2021-07-15T17:58:03Z OWNER

Started a conversation about this on the SQLite forum: https://sqlite.org/forum/forumpost/2d76f2bcf65d256a?t=h

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Big performance boost on faceting: skip the inner order by 944870799  
880374156 https://github.com/simonw/datasette/issues/1396#issuecomment-880374156 https://api.github.com/repos/simonw/datasette/issues/1396 MDEyOklzc3VlQ29tbWVudDg4MDM3NDE1Ng== simonw 9599 2021-07-15T04:03:18Z 2021-07-15T04:03:18Z OWNER

I fixed datasette:latest by running the following on my laptop:

docker pull datasetteproject/datasette:0.58
docker tag datasetteproject/datasette:0.58 datasetteproject/datasette:latest
docker login -u datasetteproject -p ...
docker push datasetteproject/datasette:latest

Confirmed on https://hub.docker.com/r/datasetteproject/datasette/tags?page=1&ordering=last_updated that datasette:latest and datasette:0.58 both now have the same digest of 3b5ba478040e.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"invalid reference format" publishing Docker image 944903881  
880372149 https://github.com/simonw/datasette/issues/1396#issuecomment-880372149 https://api.github.com/repos/simonw/datasette/issues/1396 MDEyOklzc3VlQ29tbWVudDg4MDM3MjE0OQ== simonw 9599 2021-07-15T03:56:49Z 2021-07-15T03:56:49Z OWNER

I'm going to leave this open until I next successfully publish a new version.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"invalid reference format" publishing Docker image 944903881  
880326049 https://github.com/simonw/datasette/issues/1396#issuecomment-880326049 https://api.github.com/repos/simonw/datasette/issues/1396 MDEyOklzc3VlQ29tbWVudDg4MDMyNjA0OQ== simonw 9599 2021-07-15T01:50:05Z 2021-07-15T01:50:05Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"invalid reference format" publishing Docker image 944903881  
880325362 https://github.com/simonw/datasette/issues/1396#issuecomment-880325362 https://api.github.com/repos/simonw/datasette/issues/1396 MDEyOklzc3VlQ29tbWVudDg4MDMyNTM2Mg== simonw 9599 2021-07-15T01:48:11Z 2021-07-15T01:48:11Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"invalid reference format" publishing Docker image 944903881  
880325004 https://github.com/simonw/datasette/issues/1396#issuecomment-880325004 https://api.github.com/repos/simonw/datasette/issues/1396 MDEyOklzc3VlQ29tbWVudDg4MDMyNTAwNA== simonw 9599 2021-07-15T01:47:17Z 2021-07-15T01:47:17Z OWNER

This is the part of the publish workflow that failed and threw the "invalid reference format" error: https://github.com/simonw/datasette/blob/084cfe1e00e1a4c0515390a513aca286eeea20c2/.github/workflows/publish.yml#L100-L119

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"invalid reference format" publishing Docker image 944903881  
880324637 https://github.com/simonw/datasette/issues/1396#issuecomment-880324637 https://api.github.com/repos/simonw/datasette/issues/1396 MDEyOklzc3VlQ29tbWVudDg4MDMyNDYzNw== simonw 9599 2021-07-15T01:46:26Z 2021-07-15T01:46:26Z OWNER

I manually published the Docker image using https://github.com/simonw/datasette/actions/workflows/push_docker_tag.yml https://github.com/simonw/datasette/runs/3072505126

The 0.58 release shows up on https://hub.docker.com/r/datasetteproject/datasette/tags?page=1&ordering=last_updated now - BUT the latest tag still points to a version from a month ago.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"invalid reference format" publishing Docker image 944903881  
880287483 https://github.com/simonw/datasette/issues/1394#issuecomment-880287483 https://api.github.com/repos/simonw/datasette/issues/1394 MDEyOklzc3VlQ29tbWVudDg4MDI4NzQ4Mw== simonw 9599 2021-07-15T00:01:47Z 2021-07-15T00:01:47Z OWNER

I wrote this code:

_order_by_re = re.compile(r"(^.*) order by [a-zA-Z_][a-zA-Z0-9_]+( desc)?$", re.DOTALL)
_order_by_braces_re = re.compile(r"(^.*) order by \[[^\]]+\]( desc)?$", re.DOTALL)


def strip_order_by(sql):
    for regex in (_order_by_re, _order_by_braces_re):
        match = regex.match(sql)
        if match is not None:
            return match.group(1)
    return sql

@pytest.mark.parametrize(
    "sql,expected",
    [
        ("blah", "blah"),
        ("select * from foo", "select * from foo"),
        ("select * from foo order by bah", "select * from foo"),
        ("select * from foo order by bah desc", "select * from foo"),
        ("select * from foo order by [select]", "select * from foo"),
        ("select * from foo order by [select] desc", "select * from foo"),
    ],
)
def test_strip_order_by(sql, expected):
    assert strip_order_by(sql) == expected

But it turns out I don't need it! The SQL that is passed to the facet class is created by this code: https://github.com/simonw/datasette/blob/ba11ef27edd6981eeb26d7ecf5aa236707f5f8ce/datasette/views/table.py#L677-L684

And the only place that uses that sql_no_limit variable is here: https://github.com/simonw/datasette/blob/ba11ef27edd6981eeb26d7ecf5aa236707f5f8ce/datasette/views/table.py#L733-L745

So I can change that to sql_no_limit_no_order and fix the bug that way instead.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Big performance boost on faceting: skip the inner order by 944870799  
880278256 https://github.com/simonw/datasette/issues/1394#issuecomment-880278256 https://api.github.com/repos/simonw/datasette/issues/1394 MDEyOklzc3VlQ29tbWVudDg4MDI3ODI1Ng== simonw 9599 2021-07-14T23:35:18Z 2021-07-14T23:35:18Z OWNER

The challenge here is that faceting doesn't currently modify the inner SQL at all - it wraps it so that it can work against any SQL statement (though Datasette itself does not yet take advantage of that ability, only offering faceting on table pages).

So just removing the order by wouldn't be appropriate if the inner query looked something like this:

select * from items order by created desc limit 100

Since the intent there would be to return facet counts against only the most recent 100 items.

In SQLite the limit has to come after the order by though, so the fix here could be as easy as using a regular expression to identify queries that end with order by COLUMN (desc)? and stripping off that clause.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Big performance boost on faceting: skip the inner order by 944870799  
880259255 https://github.com/simonw/sqlite-utils/issues/297#issuecomment-880259255 https://api.github.com/repos/simonw/sqlite-utils/issues/297 MDEyOklzc3VlQ29tbWVudDg4MDI1OTI1NQ== simonw 9599 2021-07-14T22:48:41Z 2021-07-14T22:48:41Z OWNER

Should also take advantage of .mode tabs to support sqlite-utils insert blah.db blah blah.csv --tsv --fast

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Option for importing CSV data using the SQLite .import mechanism 944846776  
880257587 https://github.com/simonw/sqlite-utils/issues/297#issuecomment-880257587 https://api.github.com/repos/simonw/sqlite-utils/issues/297 MDEyOklzc3VlQ29tbWVudDg4MDI1NzU4Nw== simonw 9599 2021-07-14T22:44:05Z 2021-07-14T22:44:05Z OWNER

https://unix.stackexchange.com/a/642364 suggests you can also use this to import from stdin, like so:

sqlite3 -csv $database_file_name ".import '|cat -' $table_name"

Here the sqlite3 -csv is an alternative to using .mode csv.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Option for importing CSV data using the SQLite .import mechanism 944846776  
880256865 https://github.com/simonw/sqlite-utils/issues/297#issuecomment-880256865 https://api.github.com/repos/simonw/sqlite-utils/issues/297 MDEyOklzc3VlQ29tbWVudDg4MDI1Njg2NQ== simonw 9599 2021-07-14T22:42:11Z 2021-07-14T22:42:11Z OWNER

Potential workaround for missing --skip implementation is that the filename can be a command instead, so maybe it could shell out to tail -n +1 filename:

The source argument is the name of a file to be read or, if it begins with a "|" character, specifies a command which will be run to produce the input CSV data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Option for importing CSV data using the SQLite .import mechanism 944846776  
880256058 https://github.com/simonw/sqlite-utils/issues/297#issuecomment-880256058 https://api.github.com/repos/simonw/sqlite-utils/issues/297 MDEyOklzc3VlQ29tbWVudDg4MDI1NjA1OA== simonw 9599 2021-07-14T22:40:01Z 2021-07-14T22:40:47Z OWNER

Full docs here: https://www.sqlite.org/draft/cli.html#csv

One catch: how this works has changed in recent SQLite versions: https://www.sqlite.org/changes.html

  • 2020-12-01 (3.34.0) - "Table name quoting works correctly for the .import dot-command"
  • 2020-05-22 (3.32.0) - "Add options to the .import command: --csv, --ascii, --skip"
  • 2017-08-01 (3.20.0) - " The ".import" command ignores an initial UTF-8 BOM."

The "skip" feature is particularly important to understand. https://www.sqlite.org/draft/cli.html#csv says:

There are two cases to consider: (1) Table "tab1" does not previously exist and (2) table "tab1" does already exist.

In the first case, when the table does not previously exist, the table is automatically created and the content of the first row of the input CSV file is used to determine the name of all the columns in the table. In other words, if the table does not previously exist, the first row of the CSV file is interpreted to be column names and the actual data starts on the second row of the CSV file.

For the second case, when the table already exists, every row of the CSV file, including the first row, is assumed to be actual content. If the CSV file contains an initial row of column labels, you can cause the .import command to skip that initial row using the "--skip 1" option.

But the --skip 1 option is only available in 3.32.0 and higher.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Option for importing CSV data using the SQLite .import mechanism 944846776  
880153069 https://github.com/simonw/datasette/issues/268#issuecomment-880153069 https://api.github.com/repos/simonw/datasette/issues/268 MDEyOklzc3VlQ29tbWVudDg4MDE1MzA2OQ== simonw 9599 2021-07-14T19:31:00Z 2021-07-14T19:31:00Z OWNER

... though interestingly I can't replicate that error on latest.datasette.io - https://latest.datasette.io/fixtures/searchable?_search=park.&_searchmode=raw

That's running https://latest.datasette.io/-/versions SQLite 3.35.4 whereas https://www.niche-museums.com/-/versions is running 3.27.2 (the most recent version available with Vercel) - but there's nothing in the SQLite changelog between those two versions that suggests changes to how the FTS5 parser works. https://www.sqlite.org/changes.html

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for ranking results from SQLite full-text search 323718842  
880150755 https://github.com/simonw/datasette/issues/268#issuecomment-880150755 https://api.github.com/repos/simonw/datasette/issues/268 MDEyOklzc3VlQ29tbWVudDg4MDE1MDc1NQ== simonw 9599 2021-07-14T19:26:47Z 2021-07-14T19:29:08Z OWNER

What are the side-effects of turning that on in the query string, or even by default as you suggested? I see that you stated in the docs... "to ensure they do not cause any confusion for users who are not aware of them", but I'm not sure what those could be.

Mainly that it's possible to generate SQL queries that crash with an error. This was the example that convinced me to default to escaping:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for ranking results from SQLite full-text search 323718842  
579675357 https://github.com/simonw/datasette/issues/651#issuecomment-579675357 https://api.github.com/repos/simonw/datasette/issues/651 MDEyOklzc3VlQ29tbWVudDU3OTY3NTM1Nw== clausjuhl 2181410 2020-01-29T09:45:00Z 2021-07-14T19:26:06Z NONE

Hi Simon

Thank you for adding the escape_function, but it does not work on my datasette-installation (0.33). I've added the following file to my datasette-dir: /plugins/sql_functions.py:

from datasette import hookimpl

def escape_fts_query(query):
    bits = query.split()
    return ' '.join('"{}"'.format(bit.replace('"', '')) for bit in bits)

@hookimpl
def prepare_connection(conn):
    conn.create_function("escape_fts_query", 1, escape_fts_query)`

It has no effect on the standard queries to the tables though, as they still produce errors when including any characters like '-', '/', '+' or '?'

Does the function only work when using costum queries, where I can include the escape_fts-function explicitly in the sql-query?

PS. I'm calling datasette with --plugins=plugins, and my other plugins work just fine.
PPS. The fts5 virtual table is created with 'sqlite3' like so:

CREATE VIRTUAL TABLE "cases_fts" USING FTS5( title, subtitle, resume, suggestion, presentation, detail = full, content_rowid = 'id', content = 'cases', tokenize='unicode61', 'remove_diacritics 2', 'tokenchars "-_"' );

Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
fts5 syntax error when using punctuation 539590148  
879477586 https://github.com/dogsheep/healthkit-to-sqlite/issues/12#issuecomment-879477586 https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/12 MDEyOklzc3VlQ29tbWVudDg3OTQ3NzU4Ng== simonw 9599 2021-07-13T23:50:06Z 2021-07-13T23:50:06Z MEMBER

Unfortunately I don't think updating the database is practical, because the export doesn't include unique identifiers which can be used to update existing records and create new ones. Recreating from scratch works around that limitation.

I've not explored workouts with SpatiaLite but that's a really good idea.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Some workout columns should be float, not text 727848625  
879309636 https://github.com/simonw/datasette/pull/1393#issuecomment-879309636 https://api.github.com/repos/simonw/datasette/issues/1393 MDEyOklzc3VlQ29tbWVudDg3OTMwOTYzNg== simonw 9599 2021-07-13T18:32:25Z 2021-07-13T18:32:25Z OWNER

Thanks

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Update deploying.rst 941412189  
879277953 https://github.com/simonw/datasette/pull/1392#issuecomment-879277953 https://api.github.com/repos/simonw/datasette/issues/1392 MDEyOklzc3VlQ29tbWVudDg3OTI3Nzk1Mw== simonw 9599 2021-07-13T17:42:31Z 2021-07-13T17:42:31Z OWNER

Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Update deploying.rst 941403676  
877874117 https://github.com/dogsheep/healthkit-to-sqlite/issues/12#issuecomment-877874117 https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/12 MDEyOklzc3VlQ29tbWVudDg3Nzg3NDExNw== Mjboothaus 956433 2021-07-11T23:03:37Z 2021-07-11T23:03:37Z NONE

P.s. wondering if you have explored using the spatialite functionality with the location data in workouts?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Some workout columns should be float, not text 727848625  
877835171 https://github.com/simonw/datasette/issues/511#issuecomment-877835171 https://api.github.com/repos/simonw/datasette/issues/511 MDEyOklzc3VlQ29tbWVudDg3NzgzNTE3MQ== simonw 9599 2021-07-11T17:23:05Z 2021-07-11T17:23:05Z OWNER
== 87 failed, 819 passed, 7 skipped, 29 errors in 2584.85s (0:43:04) ==

https://github.com/simonw/datasette/runs/3038188870?check_suite_focus=true

Full copy of log here: https://gist.github.com/simonw/4b1fdd24496b989fca56bc757be345ad

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Get Datasette tests passing on Windows in GitHub Actions 456578474  
877805513 https://github.com/dogsheep/healthkit-to-sqlite/issues/12#issuecomment-877805513 https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/12 MDEyOklzc3VlQ29tbWVudDg3NzgwNTUxMw== Mjboothaus 956433 2021-07-11T14:03:01Z 2021-07-11T14:03:01Z NONE

Hi Simon -- just experimenting with your excellent software!

Up to this point in time I have been using the (paid) HealthFit App to export my workouts from my Apple Watch, one walk at the time into either .GPX or .FIT format and then using another library to suck it into Python and eventually here to my "Emmaus Walking" app:

https://share.streamlit.io/mjboothaus/emmaus_walking/emmaus_walking/app.py

I just used healthkit-to-sqlite to convert my export.zip file and it all "just worked".

I did notice the issue with various numeric fields being stored in the SQLite db as TEXT for now and just thought I'd flag it - but you're already self-reported this issue.

Keep up the great work!

I was curious if you have any thoughts about periodically exporting "export.zip" and how to just update the SQLite file instead of re-creating it each time. Hopefully Apple will give some thought to managing this data in a more sensible fashion as it grows over time. Ideally one could pull it from iCloud (where it is allegedly being backed up).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Some workout columns should be float, not text 727848625  
877726495 https://github.com/simonw/datasette/issues/511#issuecomment-877726495 https://api.github.com/repos/simonw/datasette/issues/511 MDEyOklzc3VlQ29tbWVudDg3NzcyNjQ5NQ== simonw 9599 2021-07-11T01:32:27Z 2021-07-11T01:32:27Z OWNER

I'm using pytest-xdist and this:

pytest -n auto -m "not serial"

I'll try not using the -n auto bit on Windows and see if that helps.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Get Datasette tests passing on Windows in GitHub Actions 456578474  
877726288 https://github.com/simonw/datasette/issues/511#issuecomment-877726288 https://api.github.com/repos/simonw/datasette/issues/511 MDEyOklzc3VlQ29tbWVudDg3NzcyNjI4OA== simonw 9599 2021-07-11T01:29:41Z 2021-07-11T01:29:41Z OWNER

Lots of errors that look like this:

2021-07-11T00:40:32.1189321Z E           NotADirectoryError: [WinError 267] The directory name is invalid: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpdr41pgwg\\data.db'
2021-07-11T00:40:32.1190083Z 
2021-07-11T00:40:32.1191128Z c:\hostedtoolcache\windows\python\3.8.10\x64\lib\shutil.py:596: NotADirectoryError
2021-07-11T00:40:32.1191999Z ___________________ ERROR at teardown of test_insert_error ____________________
2021-07-11T00:40:32.1192842Z [gw1] win32 -- Python 3.8.10 c:\hostedtoolcache\windows\python\3.8.10\x64\python.exe
2021-07-11T00:40:32.1193387Z 
2021-07-11T00:40:32.1193930Z path = 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpry729pq_'
2021-07-11T00:40:32.1194876Z onerror = <function TemporaryDirectory._rmtree.<locals>.onerror at 0x00000291FCEA93A0>
2021-07-11T00:40:32.1195480Z 
2021-07-11T00:40:32.1195927Z     def _rmtree_unsafe(path, onerror):
2021-07-11T00:40:32.1196435Z         try:
2021-07-11T00:40:32.1196910Z             with os.scandir(path) as scandir_it:
2021-07-11T00:40:32.1197504Z                 entries = list(scandir_it)
2021-07-11T00:40:32.1198002Z         except OSError:
2021-07-11T00:40:32.1198607Z             onerror(os.scandir, path, sys.exc_info())
2021-07-11T00:40:32.1199137Z             entries = []
2021-07-11T00:40:32.1199637Z         for entry in entries:
2021-07-11T00:40:32.1200184Z             fullname = entry.path
2021-07-11T00:40:32.1200692Z             if _rmtree_isdir(entry):
2021-07-11T00:40:32.1201198Z                 try:
2021-07-11T00:40:32.1201643Z                     if entry.is_symlink():
2021-07-11T00:40:32.1202280Z                         # This can only happen if someone replaces
2021-07-11T00:40:32.1202944Z                         # a directory with a symlink after the call to
2021-07-11T00:40:32.1203623Z                         # os.scandir or entry.is_dir above.
2021-07-11T00:40:32.1204303Z                         raise OSError("Cannot call rmtree on a symbolic link")
2021-07-11T00:40:32.1204942Z                 except OSError:
2021-07-11T00:40:32.1206416Z                     onerror(os.path.islink, fullname, sys.exc_info())
2021-07-11T00:40:32.1207022Z                     continue
2021-07-11T00:40:32.1207584Z                 _rmtree_unsafe(fullname, onerror)
2021-07-11T00:40:32.1208074Z             else:
2021-07-11T00:40:32.1208496Z                 try:
2021-07-11T00:40:32.1208926Z >                   os.unlink(fullname)
2021-07-11T00:40:32.1210053Z E                   PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpry729pq_\\data.db'
2021-07-11T00:40:32.1210974Z 
2021-07-11T00:40:32.1211638Z c:\hostedtoolcache\windows\python\3.8.10\x64\lib\shutil.py:616: PermissionError
2021-07-11T00:40:32.1212211Z 
2021-07-11T00:40:32.1212846Z During handling of the above exception, another exception occurred:
2021-07-11T00:40:32.1213320Z 
2021-07-11T00:40:32.1213797Z func = <built-in function unlink>
2021-07-11T00:40:32.1214529Z path = 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpry729pq_\\data.db'
2021-07-11T00:40:32.1215763Z exc_info = (<class 'PermissionError'>, PermissionError(13, 'The process cannot access the file because it is being used by another process'), <traceback object at 0x00000291FB4D7040>)
2021-07-11T00:40:32.1217263Z 
2021-07-11T00:40:32.1217777Z     def onerror(func, path, exc_info):
2021-07-11T00:40:32.1218421Z         if issubclass(exc_info[0], PermissionError):
2021-07-11T00:40:32.1219079Z             def resetperms(path):
2021-07-11T00:40:32.1219518Z                 try:
2021-07-11T00:40:32.1219992Z                     _os.chflags(path, 0)
2021-07-11T00:40:32.1220535Z                 except AttributeError:
2021-07-11T00:40:32.1221110Z                     pass
2021-07-11T00:40:32.1221545Z                 _os.chmod(path, 0o700)
2021-07-11T00:40:32.1221984Z     
2021-07-11T00:40:32.1222330Z             try:
2021-07-11T00:40:32.1222768Z                 if path != name:
2021-07-11T00:40:32.1223332Z                     resetperms(_os.path.dirname(path))
2021-07-11T00:40:32.1223963Z                 resetperms(path)
2021-07-11T00:40:32.1224408Z     
2021-07-11T00:40:32.1224749Z                 try:
2021-07-11T00:40:32.1225954Z >                   _os.unlink(path)
2021-07-11T00:40:32.1227032Z E                   PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpry729pq_\\data.db'
2021-07-11T00:40:32.1227927Z 
2021-07-11T00:40:32.1228646Z c:\hostedtoolcache\windows\python\3.8.10\x64\lib\tempfile.py:802: PermissionError
2021-07-11T00:40:32.1229200Z 
2021-07-11T00:40:32.1229842Z During handling of the above exception, another exception occurred:
2021-07-11T00:40:32.1230355Z 
2021-07-11T00:40:32.1230783Z     @pytest.fixture
2021-07-11T00:40:32.1231322Z     def canned_write_client():
2021-07-11T00:40:32.1231805Z         with make_app_client(
2021-07-11T00:40:32.1232467Z             extra_databases={"data.db": "create table names (name text)"},
2021-07-11T00:40:32.1233104Z             metadata={
2021-07-11T00:40:32.1233535Z                 "databases": {
2021-07-11T00:40:32.1233989Z                     "data": {
2021-07-11T00:40:32.1234416Z                         "queries": {
2021-07-11T00:40:32.1235001Z                             "canned_read": {"sql": "select * from names"},
2021-07-11T00:40:32.1235527Z                             "add_name": {
2021-07-11T00:40:32.1236117Z                                 "sql": "insert into names (name) values (:name)",
2021-07-11T00:40:32.1236686Z                                 "write": True,
2021-07-11T00:40:32.1237317Z                                 "on_success_redirect": "/data/add_name?success",
2021-07-11T00:40:32.1237882Z                             },
2021-07-11T00:40:32.1238331Z                             "add_name_specify_id": {
2021-07-11T00:40:32.1239009Z                                 "sql": "insert into names (rowid, name) values (:rowid, :name)",
2021-07-11T00:40:32.1239610Z                                 "write": True,
2021-07-11T00:40:32.1240259Z                                 "on_error_redirect": "/data/add_name_specify_id?error",
2021-07-11T00:40:32.1240839Z                             },
2021-07-11T00:40:32.1241320Z                             "delete_name": {
2021-07-11T00:40:32.1242504Z                                 "sql": "delete from names where rowid = :rowid",
2021-07-11T00:40:32.1243127Z                                 "write": True,
2021-07-11T00:40:32.1243721Z                                 "on_success_message": "Name deleted",
2021-07-11T00:40:32.1244282Z                                 "allow": {"id": "root"},
2021-07-11T00:40:32.1244749Z                             },
2021-07-11T00:40:32.1245959Z                             "update_name": {
2021-07-11T00:40:32.1246614Z                                 "sql": "update names set name = :name where rowid = :rowid",
2021-07-11T00:40:32.1247267Z                                 "params": ["rowid", "name", "extra"],
2021-07-11T00:40:32.1247828Z                                 "write": True,
2021-07-11T00:40:32.1248247Z                             },
2021-07-11T00:40:32.1248653Z                         }
2021-07-11T00:40:32.1249166Z                     }
2021-07-11T00:40:32.1249577Z                 }
2021-07-11T00:40:32.1249962Z             },
2021-07-11T00:40:32.1250333Z         ) as client:
2021-07-11T00:40:32.1250822Z >           yield client
2021-07-11T00:40:32.1251078Z 
2021-07-11T00:40:32.1251678Z D:\a\datasette\datasette\tests\test_canned_queries.py:43: 
2021-07-11T00:40:32.1252347Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2021-07-11T00:40:32.1253040Z c:\hostedtoolcache\windows\python\3.8.10\x64\lib\contextlib.py:120: in __exit__
2021-07-11T00:40:32.1253759Z     next(self.gen)
2021-07-11T00:40:32.1254398Z D:\a\datasette\datasette\tests\fixtures.py:156: in make_app_client
2021-07-11T00:40:32.1255098Z     yield TestClient(ds)
2021-07-11T00:40:32.1255796Z c:\hostedtoolcache\windows\python\3.8.10\x64\lib\tempfile.py:827: in __exit__
2021-07-11T00:40:32.1256510Z     self.cleanup()
2021-07-11T00:40:32.1257200Z c:\hostedtoolcache\windows\python\3.8.10\x64\lib\tempfile.py:831: in cleanup
2021-07-11T00:40:32.1257961Z     self._rmtree(self.name)
2021-07-11T00:40:32.1258712Z c:\hostedtoolcache\windows\python\3.8.10\x64\lib\tempfile.py:813: in _rmtree
2021-07-11T00:40:32.1259487Z     _shutil.rmtree(name, onerror=onerror)
2021-07-11T00:40:32.1260280Z c:\hostedtoolcache\windows\python\3.8.10\x64\lib\shutil.py:740: in rmtree
2021-07-11T00:40:32.1261039Z     return _rmtree_unsafe(path, onerror)
2021-07-11T00:40:32.1261843Z c:\hostedtoolcache\windows\python\3.8.10\x64\lib\shutil.py:618: in _rmtree_unsafe
2021-07-11T00:40:32.1262633Z     onerror(os.unlink, fullname, sys.exc_info())
2021-07-11T00:40:32.1263456Z c:\hostedtoolcache\windows\python\3.8.10\x64\lib\tempfile.py:805: in onerror
2021-07-11T00:40:32.1264175Z     cls._rmtree(path)
2021-07-11T00:40:32.1264848Z c:\hostedtoolcache\windows\python\3.8.10\x64\lib\tempfile.py:813: in _rmtree
2021-07-11T00:40:32.1266329Z     _shutil.rmtree(name, onerror=onerror)
2021-07-11T00:40:32.1267082Z c:\hostedtoolcache\windows\python\3.8.10\x64\lib\shutil.py:740: in rmtree
2021-07-11T00:40:32.1267858Z     return _rmtree_unsafe(path, onerror)
2021-07-11T00:40:32.1268615Z c:\hostedtoolcache\windows\python\3.8.10\x64\lib\shutil.py:599: in _rmtree_unsafe
2021-07-11T00:40:32.1269440Z     onerror(os.scandir, path, sys.exc_info())
2021-07-11T00:40:32.1269979Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2021-07-11T00:40:32.1270287Z 
2021-07-11T00:40:32.1270947Z path = 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpry729pq_\\data.db'
2021-07-11T00:40:32.1273356Z onerror = <function TemporaryDirectory._rmtree.<locals>.onerror at 0x00000291FCF40E50>
2021-07-11T00:40:32.1273999Z 
2021-07-11T00:40:32.1274493Z     def _rmtree_unsafe(path, onerror):
2021-07-11T00:40:32.1274953Z         try:
2021-07-11T00:40:32.1275461Z >           with os.scandir(path) as scandir_it:
2021-07-11T00:40:32.1276459Z E           NotADirectoryError: [WinError 267] The directory name is invalid: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpry729pq_\\data.db'
2021-07-11T00:40:32.1277220Z 
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Get Datasette tests passing on Windows in GitHub Actions 456578474  
877725742 https://github.com/simonw/datasette/issues/511#issuecomment-877725742 https://api.github.com/repos/simonw/datasette/issues/511 MDEyOklzc3VlQ29tbWVudDg3NzcyNTc0Mg== simonw 9599 2021-07-11T01:25:01Z 2021-07-11T01:26:38Z OWNER

That's weird. https://github.com/simonw/datasette/runs/3037862798 finished running and came up green - but actually a TON of the tests failed on Windows. Not sure why that didn't fail the whole test suite:

https://user-images.githubusercontent.com/9599/125180192-12257000-e1ac-11eb-8657-d46b7bcdc1b2.png">

Also the test suite took 50 minutes on Windows!

Here's a copy of the full log file for the tests on Python 3.8 on Windows: https://gist.github.com/simonw/2900ef33693c1bbda09188eb31c8212d

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Get Datasette tests passing on Windows in GitHub Actions 456578474  
877725193 https://github.com/simonw/datasette/issues/1388#issuecomment-877725193 https://api.github.com/repos/simonw/datasette/issues/1388 MDEyOklzc3VlQ29tbWVudDg3NzcyNTE5Mw== simonw 9599 2021-07-11T01:18:38Z 2021-07-11T01:18:38Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Serve using UNIX domain socket 939051549  
877721003 https://github.com/simonw/datasette/issues/1388#issuecomment-877721003 https://api.github.com/repos/simonw/datasette/issues/1388 MDEyOklzc3VlQ29tbWVudDg3NzcyMTAwMw== simonw 9599 2021-07-11T00:21:19Z 2021-07-11T00:21:19Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Serve using UNIX domain socket 939051549  
877718364 https://github.com/simonw/datasette/issues/511#issuecomment-877718364 https://api.github.com/repos/simonw/datasette/issues/511 MDEyOklzc3VlQ29tbWVudDg3NzcxODM2NA== simonw 9599 2021-07-10T23:54:37Z 2021-07-10T23:54:37Z OWNER

Looks like it's not even 10% of the way through, and already a bunch of errors:

https://user-images.githubusercontent.com/9599/125179059-81e12e00-e19f-11eb-94d9-0f2d9ce8afad.png">

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Get Datasette tests passing on Windows in GitHub Actions 456578474  
877718286 https://github.com/simonw/datasette/issues/511#issuecomment-877718286 https://api.github.com/repos/simonw/datasette/issues/511 MDEyOklzc3VlQ29tbWVudDg3NzcxODI4Ng== simonw 9599 2021-07-10T23:53:29Z 2021-07-10T23:53:29Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Get Datasette tests passing on Windows in GitHub Actions 456578474  
877717791 https://github.com/simonw/datasette/issues/511#issuecomment-877717791 https://api.github.com/repos/simonw/datasette/issues/511 MDEyOklzc3VlQ29tbWVudDg3NzcxNzc5MQ== simonw 9599 2021-07-10T23:45:35Z 2021-07-10T23:45:35Z OWNER

Trying to run on Windows today, I get an error from the utils/asgi.py module.

It's trying from os import EX_CANTCREAT which is Unix-only. I commented this line out, and (so far) it's working.

Good news: that line was removed in #1094.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Get Datasette tests passing on Windows in GitHub Actions 456578474  
650340914 https://github.com/simonw/datasette/pull/868#issuecomment-650340914 https://api.github.com/repos/simonw/datasette/issues/868 MDEyOklzc3VlQ29tbWVudDY1MDM0MDkxNA== codecov[bot] 22429695 2020-06-26T18:53:02Z 2021-07-10T23:41:42Z NONE

Codecov Report

Merging #868 (b452fcb) into master (0005281) will increase coverage by 0.49%.
The diff coverage is 96.19%.

:exclamation: Current head b452fcb differs from pull request most recent head c99caba. Consider uploading reports for the commit c99caba to get more accurate results

@@            Coverage Diff             @@
##           master     #868      +/-   ##
==========================================
+ Coverage   82.91%   83.40%   +0.49%     
==========================================
  Files          26       27       +1     
  Lines        3547     3634      +87     
==========================================
+ Hits         2941     3031      +90     
+ Misses        606      603       -3     
<table> <thead> <tr> <th>Impacted Files</th> <th>Coverage Δ</th> <th></th> </tr> </thead> <tbody> <tr> <td>datasette/plugins.py</td> <td>82.35% <ø> (ø)</td> <td></td> </tr> <tr> <td>datasette/default_magic_parameters.py</td> <td>91.17% <91.17%> (ø)</td> <td></td> </tr> <tr> <td>datasette/app.py</td> <td>95.99% <97.91%> (+1.32%)</td> <td>:arrow_up:</td> </tr> <tr> <td>datasette/hookspecs.py</td> <td>100.00% <100.00%> (ø)</td> <td></td> </tr> <tr> <td>datasette/utils/__init__.py</td> <td>93.93% <100.00%> (+0.08%)</td> <td>:arrow_up:</td> </tr> <tr> <td>datasette/utils/asgi.py</td> <td>91.32% <100.00%> (+0.41%)</td> <td>:arrow_up:</td> </tr> <tr> <td>datasette/views/base.py</td> <td>93.39% <100.00%> (-0.01%)</td> <td>:arrow_down:</td> </tr> <tr> <td>datasette/views/database.py</td> <td>96.37% <100.00%> (-1.96%)</td> <td>:arrow_down:</td> </tr> <tr> <td>datasette/views/table.py</td> <td>95.67% <0.00%> (-0.03%)</td> <td>:arrow_down:</td> </tr> <tr> <td>... and 6 more</td> <td></td> <td></td> </tr> </tbody> </table>

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 180c7a5...c99caba. Read the comment docs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
initial windows ci setup 646448486  
877717392 https://github.com/simonw/datasette/pull/557#issuecomment-877717392 https://api.github.com/repos/simonw/datasette/issues/557 MDEyOklzc3VlQ29tbWVudDg3NzcxNzM5Mg== simonw 9599 2021-07-10T23:39:48Z 2021-07-10T23:39:48Z OWNER

Abandoning this - need to switch to using GitHub Actions for this instead.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Get tests running on Windows using Travis CI 466996584  
877717262 https://github.com/simonw/datasette/issues/1388#issuecomment-877717262 https://api.github.com/repos/simonw/datasette/issues/1388 MDEyOklzc3VlQ29tbWVudDg3NzcxNzI2Mg== simonw 9599 2021-07-10T23:37:54Z 2021-07-10T23:37:54Z OWNER

I wonder if --fd is worth supporting too?

I'm going to hold off on implementing this until someone asks for it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Serve using UNIX domain socket 939051549  
877716993 https://github.com/simonw/datasette/issues/1388#issuecomment-877716993 https://api.github.com/repos/simonw/datasette/issues/1388 MDEyOklzc3VlQ29tbWVudDg3NzcxNjk5Mw== simonw 9599 2021-07-10T23:34:02Z 2021-07-10T23:34:02Z OWNER

Figured out an example nginx configuration. This in nginx.conf:

daemon off;
events {
  worker_connections  1024;
}
http {
  server {
    listen 8092;
    location / {
      proxy_pass              http://datasette;
      proxy_set_header        X-Real-IP $remote_addr;
      proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
    }
  }
  upstream datasette {
    server unix:/tmp/datasette.sock;
  }
}

Then run datasette --uds /tmp/datasette.sock

Then run nginx like this:

nginx -c ./nginx.conf

Then hits to http://localhost:8092/ will be proxied to Datasette.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Serve using UNIX domain socket 939051549  
877716359 https://github.com/simonw/datasette/issues/1388#issuecomment-877716359 https://api.github.com/repos/simonw/datasette/issues/1388 MDEyOklzc3VlQ29tbWVudDg3NzcxNjM1OQ== simonw 9599 2021-07-10T23:24:58Z 2021-07-10T23:24:58Z OWNER

Apparently Windows 10 has Unix domain socket support: https://bugs.python.org/issue33408

Unix socket (AF_UNIX) is now avalible in Windows 10 (April 2018 Update). Please add Python support for it.
More details about it on https://blogs.msdn.microsoft.com/commandline/2017/12/19/af_unix-comes-to-windows/

But it's not clear if this is going to work. That same issue thread (the issue is still open) suggests using hasattr(socket, 'AF_UNIX')) to detect support in tests.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Serve using UNIX domain socket 939051549  
877716156 https://github.com/simonw/datasette/issues/1388#issuecomment-877716156 https://api.github.com/repos/simonw/datasette/issues/1388 MDEyOklzc3VlQ29tbWVudDg3NzcxNjE1Ng== simonw 9599 2021-07-10T23:22:21Z 2021-07-10T23:22:21Z OWNER

I don't have the Datasette test suite running on Windows yet, but I'd like it to run there some day - so ideally this test would be skipped if Unix domain sockets are not supported by the underlying operating system.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Serve using UNIX domain socket 939051549  
877715654 https://github.com/simonw/datasette/issues/1388#issuecomment-877715654 https://api.github.com/repos/simonw/datasette/issues/1388 MDEyOklzc3VlQ29tbWVudDg3NzcxNTY1NA== simonw 9599 2021-07-10T23:15:06Z 2021-07-10T23:15:06Z OWNER

I can run tests against it using httpx: https://www.python-httpx.org/advanced/#usage_1

```pycon

import httpx

Connect to the Docker API via a Unix Socket.

transport = httpx.HTTPTransport(uds="/var/run/docker.sock")
client = httpx.Client(transport=transport)
response = client.get("http://docker/info")
```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Serve using UNIX domain socket 939051549  
877714698 https://github.com/simonw/datasette/issues/1388#issuecomment-877714698 https://api.github.com/repos/simonw/datasette/issues/1388 MDEyOklzc3VlQ29tbWVudDg3NzcxNDY5OA== simonw 9599 2021-07-10T23:01:37Z 2021-07-10T23:01:37Z OWNER

Can test this with:

curl --unix-socket ${socket} -i "http://localhost/" 
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Serve using UNIX domain socket 939051549  
877691558 https://github.com/simonw/datasette/issues/1391#issuecomment-877691558 https://api.github.com/repos/simonw/datasette/issues/1391 MDEyOklzc3VlQ29tbWVudDg3NzY5MTU1OA== simonw 9599 2021-07-10T19:26:57Z 2021-07-10T19:26:57Z OWNER

The https://latest.datasette.io/fixtures.db file no longer includes generated columns, which will help avoid confusion such as seen in #1376.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stop using generated columns in fixtures.db 941300946  
877691427 https://github.com/simonw/datasette/issues/1391#issuecomment-877691427 https://api.github.com/repos/simonw/datasette/issues/1391 MDEyOklzc3VlQ29tbWVudDg3NzY5MTQyNw== simonw 9599 2021-07-10T19:26:00Z 2021-07-10T19:26:00Z OWNER

I had to run the tests locally on my macOS laptop using pysqlite3 to get a version that supported generated columns - wrote up a TIL about that here: https://til.simonwillison.net/sqlite/pysqlite3-on-macos

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stop using generated columns in fixtures.db 941300946  
877687196 https://github.com/simonw/datasette/issues/1391#issuecomment-877687196 https://api.github.com/repos/simonw/datasette/issues/1391 MDEyOklzc3VlQ29tbWVudDg3NzY4NzE5Ng== simonw 9599 2021-07-10T18:58:40Z 2021-07-10T18:58:40Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stop using generated columns in fixtures.db 941300946  
877686784 https://github.com/simonw/datasette/issues/1391#issuecomment-877686784 https://api.github.com/repos/simonw/datasette/issues/1391 MDEyOklzc3VlQ29tbWVudDg3NzY4Njc4NA== simonw 9599 2021-07-10T18:56:03Z 2021-07-10T18:56:03Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stop using generated columns in fixtures.db 941300946  
877682533 https://github.com/simonw/datasette/issues/1391#issuecomment-877682533 https://api.github.com/repos/simonw/datasette/issues/1391 MDEyOklzc3VlQ29tbWVudDg3NzY4MjUzMw== simonw 9599 2021-07-10T18:28:05Z 2021-07-10T18:28:05Z OWNER

Here's the test in question: https://github.com/simonw/datasette/blob/a6c55afe8c82ead8deb32f90c9324022fd422324/tests/test_api.py#L2033-L2046

Various other places in the test code also need changing - anything that calls supports_generated_columns().

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Stop using generated columns in fixtures.db 941300946  
877681031 https://github.com/simonw/datasette/issues/1389#issuecomment-877681031 https://api.github.com/repos/simonw/datasette/issues/1389 MDEyOklzc3VlQ29tbWVudDg3NzY4MTAzMQ== simonw 9599 2021-07-10T18:17:29Z 2021-07-10T18:17:29Z OWNER

I don't like ?_searchmode=default because it suggests "use the default" - but it actually over-rides the default that was specified by "searchmode": "raw" in metadata.json.

I'm going with ?_searchmode=escaped instead.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"searchmode": "raw" in table metadata 940077168  
877310125 https://github.com/simonw/datasette/issues/1390#issuecomment-877310125 https://api.github.com/repos/simonw/datasette/issues/1390 MDEyOklzc3VlQ29tbWVudDg3NzMxMDEyNQ== simonw 9599 2021-07-09T16:32:57Z 2021-07-09T16:32:57Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mention restarting systemd in documentation 940891698  
877308310 https://github.com/simonw/datasette/issues/1390#issuecomment-877308310 https://api.github.com/repos/simonw/datasette/issues/1390 MDEyOklzc3VlQ29tbWVudDg3NzMwODMxMA== simonw 9599 2021-07-09T16:29:48Z 2021-07-09T16:29:48Z OWNER
sudo systemctl restart datasette.service
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mention restarting systemd in documentation 940891698  
876721585 https://github.com/simonw/datasette/issues/268#issuecomment-876721585 https://api.github.com/repos/simonw/datasette/issues/268 MDEyOklzc3VlQ29tbWVudDg3NjcyMTU4NQ== rayvoelker 9308268 2021-07-08T20:22:17Z 2021-07-08T20:22:17Z NONE

I do like the idea of there being a option for turning that on by default so that you could use those terms in the default "Search" bar presented when you browse to a table where FTS has been enabled. Maybe even a small inline pop up with a short bit explaining the FTS feature and the keywords (e.g. case matters). What are the side-effects of turning that on in the query string, or even by default as you suggested? I see that you stated in the docs... "to ensure they do not cause any confusion for users who are not aware of them", but I'm not sure what those could be.

Isn't it the case that those keywords are only picked up by sqlite in where you're using the MATCH clause?

Seems like a really powerful feature (even though there are a lot of hurdles around setting it up in the sqlite db ... sqlite-utils makes that so simple by the way!)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for ranking results from SQLite full-text search 323718842  
876620095 https://github.com/simonw/datasette/issues/1389#issuecomment-876620095 https://api.github.com/repos/simonw/datasette/issues/1389 MDEyOklzc3VlQ29tbWVudDg3NjYyMDA5NQ== simonw 9599 2021-07-08T17:35:09Z 2021-07-08T17:35:09Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"searchmode": "raw" in table metadata 940077168  
876619531 https://github.com/simonw/datasette/issues/1389#issuecomment-876619531 https://api.github.com/repos/simonw/datasette/issues/1389 MDEyOklzc3VlQ29tbWVudDg3NjYxOTUzMQ== simonw 9599 2021-07-08T17:34:16Z 2021-07-08T17:34:16Z OWNER

If I implement this I'll also set it up so ?_searchmode=default can be used to over-ride that and go back to the default behaviour.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"searchmode": "raw" in table metadata 940077168  
876619271 https://github.com/simonw/datasette/issues/1389#issuecomment-876619271 https://api.github.com/repos/simonw/datasette/issues/1389 MDEyOklzc3VlQ29tbWVudDg3NjYxOTI3MQ== simonw 9599 2021-07-08T17:33:49Z 2021-07-08T17:33:49Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"searchmode": "raw" in table metadata 940077168  
876618847 https://github.com/simonw/datasette/issues/1389#issuecomment-876618847 https://api.github.com/repos/simonw/datasette/issues/1389 MDEyOklzc3VlQ29tbWVudDg3NjYxODg0Nw== simonw 9599 2021-07-08T17:33:08Z 2021-07-08T17:33:08Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"searchmode": "raw" in table metadata 940077168  
876618582 https://github.com/simonw/datasette/issues/1389#issuecomment-876618582 https://api.github.com/repos/simonw/datasette/issues/1389 MDEyOklzc3VlQ29tbWVudDg3NjYxODU4Mg== simonw 9599 2021-07-08T17:32:40Z 2021-07-08T17:32:40Z OWNER

This makes sense to me since other useful querystring arguments like this can be set as defaults in the metadata.json configuration.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
"searchmode": "raw" in table metadata 940077168  
876616414 https://github.com/simonw/datasette/issues/268#issuecomment-876616414 https://api.github.com/repos/simonw/datasette/issues/268 MDEyOklzc3VlQ29tbWVudDg3NjYxNjQxNA== simonw 9599 2021-07-08T17:29:04Z 2021-07-08T17:29:04Z OWNER

I had setup a full text search on my instance of Datasette for title data for our public library, and was noticing that some of the features of the SQLite FTS weren't working as expected ... and maybe the issue is in the escape_fts() function

That's a deliberate feature (albeit controversial, see #759) - part of the main problem here is that it's easy to construct a SQLite full-text search string which results in a database error. This is a bad user-experience!

You can opt-in to raw SQL queries by appending ?_searchmode=raw to the page, see https://docs.datasette.io/en/stable/full_text_search.html#advanced-sqlite-search-queries

But maybe there should be an option for turning that on by default without needing the query string?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for ranking results from SQLite full-text search 323718842  
876428348 https://github.com/simonw/datasette/issues/268#issuecomment-876428348 https://api.github.com/repos/simonw/datasette/issues/268 MDEyOklzc3VlQ29tbWVudDg3NjQyODM0OA== rayvoelker 9308268 2021-07-08T13:13:12Z 2021-07-08T13:13:12Z NONE

I had setup a full text search on my instance of Datasette for title data for our public library, and was noticing that some of the features of the SQLite FTS weren't working as expected ... and maybe the issue is in the escape_fts() function


vs removing the function...

Also, on the issue of sorting by rank by default .. perhaps something like this could work for the baked-in default SQL query for Datasette?

link to the above search in my instance of Datasette

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for ranking results from SQLite full-text search 323718842  
876213177 https://github.com/simonw/datasette/issues/1388#issuecomment-876213177 https://api.github.com/repos/simonw/datasette/issues/1388 MDEyOklzc3VlQ29tbWVudDg3NjIxMzE3Nw== aslakr 80737 2021-07-08T07:47:17Z 2021-07-08T07:47:17Z CONTRIBUTOR

This sounds like a valuable feature for people running Datasette behind a proxy.

Yes, in some cases it is easer to use e.g. Apache's ProxyPass Directive with Unix Domain Socket like unix:/home/www.socket|http://localhost/whatever/.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Serve using UNIX domain socket 939051549  
875742910 https://github.com/simonw/datasette/issues/1388#issuecomment-875742910 https://api.github.com/repos/simonw/datasette/issues/1388 MDEyOklzc3VlQ29tbWVudDg3NTc0MjkxMA== simonw 9599 2021-07-07T16:20:50Z 2021-07-07T16:23:02Z OWNER

I wonder if --fd is worth supporting too? Uvicorn documentation says that's useful for running under process managers, I'd want to understand exactly how to use that (and test it) before adding the feature though.

https://www.uvicorn.org/settings/

Docs on how to use a process manager: https://www.uvicorn.org/deployment/#supervisor

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Serve using UNIX domain socket 939051549  
875741410 https://github.com/simonw/datasette/issues/1388#issuecomment-875741410 https://api.github.com/repos/simonw/datasette/issues/1388 MDEyOklzc3VlQ29tbWVudDg3NTc0MTQxMA== simonw 9599 2021-07-07T16:18:50Z 2021-07-07T16:18:50Z OWNER

You could actually run Datasette like this today without modifications by running a thin Python script that imports from datasette.app, instantiates the ASGI app and passes that to uvicorn.run - but I like this as a supported feature too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Serve using UNIX domain socket 939051549  
875740085 https://github.com/simonw/datasette/issues/1388#issuecomment-875740085 https://api.github.com/repos/simonw/datasette/issues/1388 MDEyOklzc3VlQ29tbWVudDg3NTc0MDA4NQ== simonw 9599 2021-07-07T16:17:08Z 2021-07-07T16:17:08Z OWNER

Looks pretty easy to implement - here's a hint from Uvicorn source code: https://github.com/encode/uvicorn/blob/b5af1049e63c059dc750a450c807b9768f91e906/uvicorn/main.py#L368

Need to work out a simple pattern for testing this too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Serve using UNIX domain socket 939051549  
875738149 https://github.com/simonw/datasette/issues/1388#issuecomment-875738149 https://api.github.com/repos/simonw/datasette/issues/1388 MDEyOklzc3VlQ29tbWVudDg3NTczODE0OQ== simonw 9599 2021-07-07T16:14:29Z 2021-07-07T16:14:29Z OWNER

This sounds like a valuable feature for people running Datasette behind a proxy.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Serve using UNIX domain socket 939051549  
873166836 https://github.com/simonw/datasette/issues/1387#issuecomment-873166836 https://api.github.com/repos/simonw/datasette/issues/1387 MDEyOklzc3VlQ29tbWVudDg3MzE2NjgzNg== rayvoelker 9308268 2021-07-02T17:58:23Z 2021-07-02T17:58:23Z NONE

Thanks Simon for nailing that one down! It does seem a little confusing that the ProxyPreservehost option is set to Off By default, but this config totally did the trick and fixed the issue

<Location /collection-analysis/>
   ProxyPass http://127.0.0.1:8010/collection-analysis/
   ProxyPreservehost On
</Location>
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
absolute_url() behind a proxy assembles incorrect http://127.0.0.1:8001/ URLs 935930820  
873156408 https://github.com/simonw/datasette/issues/1387#issuecomment-873156408 https://api.github.com/repos/simonw/datasette/issues/1387 MDEyOklzc3VlQ29tbWVudDg3MzE1NjQwOA== simonw 9599 2021-07-02T17:37:30Z 2021-07-02T17:37:30Z OWNER
{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
absolute_url() behind a proxy assembles incorrect http://127.0.0.1:8001/ URLs 935930820  
873141222 https://github.com/simonw/datasette/issues/1387#issuecomment-873141222 https://api.github.com/repos/simonw/datasette/issues/1387 MDEyOklzc3VlQ29tbWVudDg3MzE0MTIyMg== simonw 9599 2021-07-02T17:09:32Z 2021-07-02T17:09:32Z OWNER

I'm going to add this to the suggested Apache configuration at https://docs.datasette.io/en/stable/deploying.html#apache-proxy-configuration

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
absolute_url() behind a proxy assembles incorrect http://127.0.0.1:8001/ URLs 935930820  
873140742 https://github.com/simonw/datasette/issues/1387#issuecomment-873140742 https://api.github.com/repos/simonw/datasette/issues/1387 MDEyOklzc3VlQ29tbWVudDg3MzE0MDc0Mg== simonw 9599 2021-07-02T17:08:40Z 2021-07-02T17:08:40Z OWNER

ProxyPreserveHost On is the Apache setting - it defaults to Off: https://httpd.apache.org/docs/2.4/mod/mod_proxy.html#proxypreservehost

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
absolute_url() behind a proxy assembles incorrect http://127.0.0.1:8001/ URLs 935930820  
873139138 https://github.com/simonw/datasette/issues/1387#issuecomment-873139138 https://api.github.com/repos/simonw/datasette/issues/1387 MDEyOklzc3VlQ29tbWVudDg3MzEzOTEzOA== simonw 9599 2021-07-02T17:05:47Z 2021-07-02T17:05:47Z OWNER

In this case the proxy is Apache. So there are a couple of potential fixes:

  • Configure Apache to pass the original HTTP request Host: header through to the proxied application. This should then be documented.
  • Add a new optional feature to Datasette called something like base_host which, if set, is always used in place of the host in request.url when constructing new URLs.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
absolute_url() behind a proxy assembles incorrect http://127.0.0.1:8001/ URLs 935930820  
873137935 https://github.com/simonw/datasette/issues/1387#issuecomment-873137935 https://api.github.com/repos/simonw/datasette/issues/1387 MDEyOklzc3VlQ29tbWVudDg3MzEzNzkzNQ== simonw 9599 2021-07-02T17:03:36Z 2021-07-02T17:03:36Z OWNER

And the links to apply a facet value are broken too! https://ilsweb.cincinnatilibrary.org/collection-analysis/current_collection-3d4a4b7/bib?_facet=bib_level_callnumber

        {
          "value": "g l fiction",
          "label": "g l fiction",
          "count": 212,
          "toggle_url": "https://127.0.0.1:8010/collection-analysis/current_collection-3d4a4b7/bib.json?_facet=bib_level_callnumber&bib_level_callnumber=g+l+fiction",
          "selected": false
        }

Same problem: https://github.com/simonw/datasette/blob/ea627baccf980d7d8ebc9e1ffff1fe34d556e56f/datasette/facets.py#L251-L261

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
absolute_url() behind a proxy assembles incorrect http://127.0.0.1:8001/ URLs 935930820  
873136440 https://github.com/simonw/datasette/issues/1387#issuecomment-873136440 https://api.github.com/repos/simonw/datasette/issues/1387 MDEyOklzc3VlQ29tbWVudDg3MzEzNjQ0MA== simonw 9599 2021-07-02T17:01:48Z 2021-07-02T17:01:48Z OWNER

Here's what's happening: https://github.com/simonw/datasette/blob/d23a2671386187f61872b9f6b58e0f80ac61f8fe/datasette/views/table.py#L827-L829

This is being run through absolute_url() - defined here: https://github.com/simonw/datasette/blob/d23a2671386187f61872b9f6b58e0f80ac61f8fe/datasette/app.py#L633-L637

That's because the next_url in the JSON needs to be a full URL that a client can retrieve - as opposed to the other links on that page which are all relative links that start with /: https://github.com/simonw/datasette/blob/ea627baccf980d7d8ebc9e1ffff1fe34d556e56f/datasette/templates/_table.html#L11-L15

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
absolute_url() behind a proxy assembles incorrect http://127.0.0.1:8001/ URLs 935930820  
873134866 https://github.com/simonw/datasette/issues/1387#issuecomment-873134866 https://api.github.com/repos/simonw/datasette/issues/1387 MDEyOklzc3VlQ29tbWVudDg3MzEzNDg2Ng== simonw 9599 2021-07-02T16:58:52Z 2021-07-02T16:58:52Z OWNER

What's weird here is that the URL itself is correct - it starts with /collection-analysis/ as expected - but the hostname is wrong.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
absolute_url() behind a proxy assembles incorrect http://127.0.0.1:8001/ URLs 935930820  
869812567 https://github.com/simonw/datasette/issues/1101#issuecomment-869812567 https://api.github.com/repos/simonw/datasette/issues/1101 MDEyOklzc3VlQ29tbWVudDg2OTgxMjU2Nw== simonw 9599 2021-06-28T16:06:57Z 2021-06-28T16:07:24Z OWNER

Relevant blog post: https://simonwillison.net/2021/Jun/25/streaming-large-api-responses/ - including notes on efficiently streaming formats with some kind of separator in between the records (regular JSON).

Some export formats are friendlier for streaming than others. CSV and TSV are pretty easy to stream, as is newline-delimited JSON.

Regular JSON requires a bit more thought: you can output a [ character, then output each row in a stream with a comma suffix, then skip the comma for the last row and output a ]. Doing that requires peeking ahead (looping two at a time) to verify that you haven't yet reached the end.

Or... Martin De Wulf pointed out that you can output the first row, then output every other row with a preceeding comma---which avoids the whole "iterate two at a time" problem entirely.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
register_output_renderer() should support streaming data 749283032  
869677851 https://github.com/simonw/datasette/pull/1386#issuecomment-869677851 https://api.github.com/repos/simonw/datasette/issues/1386 MDEyOklzc3VlQ29tbWVudDg2OTY3Nzg1MQ== codecov[bot] 22429695 2021-06-28T13:18:50Z 2021-06-28T13:18:50Z NONE

Codecov Report

Merging #1386 (e974ed1) into main (ea627ba) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #1386   +/-   ##
=======================================
  Coverage   91.70%   91.70%           
=======================================
  Files          34       34           
  Lines        4364     4364           
=======================================
  Hits         4002     4002           
  Misses        362      362           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea627ba...e974ed1. Read the comment docs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Update asgiref requirement from <3.4.0,>=3.2.10 to >=3.2.10,<3.5.0 931557895  
869191854 https://github.com/simonw/datasette/issues/1101#issuecomment-869191854 https://api.github.com/repos/simonw/datasette/issues/1101 MDEyOklzc3VlQ29tbWVudDg2OTE5MTg1NA== eyeseast 25778 2021-06-27T16:42:14Z 2021-06-27T16:42:14Z CONTRIBUTOR

This would really help with this issue: https://github.com/eyeseast/datasette-geojson/issues/7

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
register_output_renderer() should support streaming data 749283032  
869105782 https://github.com/simonw/datasette/pull/1385#issuecomment-869105782 https://api.github.com/repos/simonw/datasette/issues/1385 MDEyOklzc3VlQ29tbWVudDg2OTEwNTc4Mg== codecov[bot] 22429695 2021-06-27T05:48:55Z 2021-06-27T05:48:55Z NONE

Codecov Report

Merging #1385 (db78094) into main (ea627ba) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #1385   +/-   ##
=======================================
  Coverage   91.70%   91.70%           
=======================================
  Files          34       34           
  Lines        4364     4364           
=======================================
  Hits         4002     4002           
  Misses        362      362           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea627ba...db78094. Read the comment docs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Fix + improve get_metadata plugin hook docs 930855052  
869076254 https://github.com/simonw/datasette/issues/1168#issuecomment-869076254 https://api.github.com/repos/simonw/datasette/issues/1168 MDEyOklzc3VlQ29tbWVudDg2OTA3NjI1NA== brandonrobertz 2670795 2021-06-27T00:03:16Z 2021-06-27T00:05:51Z CONTRIBUTOR

Related: Here's an implementation of a get_metadata() plugin hook by @brandonrobertz next-LI@3fd8ce9

Here's a plugin that implements metadata-within-DBs: next-LI/datasette-live-config

How it works: If a database has a __metadata table, then it gets parsed and included in the global metadata. It also implements a database-action hook with a UI for managing config.

More context: https://github.com/next-LI/datasette-live-config/blob/72e335e887f1c69c54c6c2441e07148955b0fc9f/datasette_live_config/__init__.py#L109-L140

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for storing metadata in _metadata tables 777333388  
869075395 https://github.com/simonw/datasette/issues/1384#issuecomment-869075395 https://api.github.com/repos/simonw/datasette/issues/1384 MDEyOklzc3VlQ29tbWVudDg2OTA3NTM5NQ== simonw 9599 2021-06-26T23:54:21Z 2021-06-26T23:59:21Z OWNER

(It may well be that implementing #1168 involves a switch to async metadata)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Plugin hook for dynamic metadata 930807135  

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);