html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app
https://github.com/simonw/datasette/issues/1439#issuecomment-1045027067,https://api.github.com/repos/simonw/datasette/issues/1439,1045027067,IC_kwDOBm6k_c4-Sdj7,9599,2022-02-18T19:03:26Z,2022-02-18T19:03:26Z,OWNER,"(If I make this change it may break some existing Datasette installations when they upgrade - I could try and build a plugin for them which triggers on 404s and checks to see if the old format would return a 200 response, then returns that.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045032377,https://api.github.com/repos/simonw/datasette/issues/1439,1045032377,IC_kwDOBm6k_c4-Se25,9599,2022-02-18T19:06:50Z,2022-02-18T19:06:50Z,OWNER,"How does URL routing for https://latest.datasette.io/fixtures/table%2Fwith%2Fslashes.csv work?
Right now it's https://github.com/simonw/datasette/blob/7d24fd405f3c60e4c852c5d746c91aa2ba23cf5b/datasette/app.py#L1098-L1101
That's not going to capture the dot-dash encoding version of that table name:
```pycon
>>> dot_dash_encode(""table/with/slashes.csv"")
'table-/with-/slashes-.csv'
```
Probably needs a fancy regex trick like a negative lookbehind assertion or similar.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045055772,https://api.github.com/repos/simonw/datasette/issues/1439,1045055772,IC_kwDOBm6k_c4-Skkc,9599,2022-02-18T19:23:33Z,2022-02-18T19:25:42Z,OWNER,"I want a match for this URL:
/db/table-/with-/slashes-.csv
Maybe this:
^/(?P[^/]+)/(?P([^/]*|(\-/)*|(\-\.)*|(\.\.)*)*$)
Here we are matching a sequence of:
([^/]*|(\-/)*|(\-\.)*|(\-\-)*)*
So a combination of not-slashes OR -/ or -. Or -- sequences
^/(?P[^/]+)/(?P([^/]*|(\-/)*|(\-\.)*|(\-\-)*)*$)
Try that with non-capturing bits:
^/(?P[^/]+)/(?P(?:[^/]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)*$)
`(?:[^/]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)*` visualized is:
Here's the explanation on regex101.com https://regex101.com/r/CPnsIO/1
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045059427,https://api.github.com/repos/simonw/datasette/issues/1439,1045059427,IC_kwDOBm6k_c4-Sldj,9599,2022-02-18T19:26:25Z,2022-02-18T19:26:25Z,OWNER,"With this new pattern I could probably extract out the optional `.json` format string as part of the initial route capturing regex too, rather than the current `table_and_format` hack.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045075207,https://api.github.com/repos/simonw/datasette/issues/1439,1045075207,IC_kwDOBm6k_c4-SpUH,9599,2022-02-18T19:39:35Z,2022-02-18T19:40:13Z,OWNER,"> And if for some horific reason you had a table with the name `/db/table-.csv.csv` (so `/db/` was the first part of the actual table name in SQLite) the URLs would look like this:
>
> * `/db/%2Fdb%2Ftable---.csv-.csv` - the HTML version
> * `/db/%2Fdb%2Ftable---.csv-.csv.csv` - the CSV version
> * `/db/%2Fdb%2Ftable---.csv-.csv.json` - the JSON version
Here's what those look like with the updated version of `dot_dash_encode()` that also encodes `/` as `-/`:
- `/db/-/db-/table---.csv-.csv` - HTML
- `/db/-/db-/table---.csv-.csv.csv` - CSV
- `/db/-/db-/table---.csv-.csv.json` - JSON
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045077590,https://api.github.com/repos/simonw/datasette/issues/1439,1045077590,IC_kwDOBm6k_c4-Sp5W,9599,2022-02-18T19:41:37Z,2022-02-18T19:42:41Z,OWNER,"Ugh, one disadvantage I just spotted with this: Datasette already has a `/-/versions.json` convention where ""system"" URLs are namespaced under `/-/` - but that could be confused under this new scheme with the `-/` escaping sequence.
And I've thought about adding `/db/-/special` and `/db/table/-/special` URLs in the past too.
Maybe change this system to use `.` as the escaping character instead of `-`?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045082891,https://api.github.com/repos/simonw/datasette/issues/1439,1045082891,IC_kwDOBm6k_c4-SrML,9599,2022-02-18T19:45:32Z,2022-02-18T19:45:32Z,OWNER,"```pycon
>>> dot_encode(""/db/table-.csv.csv"")
'./db./table-..csv..csv'
>>> dot_decode('./db./table-..csv..csv')
'/db/table-.csv.csv'
```
I worry that web servers might treat `./` in a special way though.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045086033,https://api.github.com/repos/simonw/datasette/issues/1439,1045086033,IC_kwDOBm6k_c4-Sr9R,9599,2022-02-18T19:47:43Z,2022-02-18T19:51:11Z,OWNER,"- https://datasette.io/-/asgi-scope/db/./db./table-..csv..csv
- https://til.simonwillison.net/-/asgi-scope/db/./db./table-..csv..csv
Do both of those survive the round-trip to populate `raw_path` correctly?
No! In both cases the `/./` bit goes missing.
It looks like this might even be a client issue - `curl` shows me this:
```
~ % curl -vv -i 'https://datasette.io/-/asgi-scope/db/./db./table-..csv..csv'
* Trying 216.239.32.21:443...
* Connected to datasette.io (216.239.32.21) port 443 (#0)
* ALPN, offering http/1.1
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate: datasette.io
* Server certificate: R3
* Server certificate: ISRG Root X1
> GET /-/asgi-scope/db/db./table-..csv..csv HTTP/1.1
```
So `curl` decided to turn `/-/asgi-scope/db/./db./table` into `/-/asgi-scope/db/db./table` before even sending the request.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045081042,https://api.github.com/repos/simonw/datasette/issues/1439,1045081042,IC_kwDOBm6k_c4-SqvS,9599,2022-02-18T19:44:12Z,2022-02-18T19:51:34Z,OWNER,"```python
def dot_encode(s):
return s.replace(""."", "".."").replace(""/"", ""./"")
def dot_decode(s):
return s.replace(""./"", ""/"").replace("".."", ""."")
```
No need for hyphen encoding in this variant at all, which simplifies things a bit.
(Update: this is flawed, see https://github.com/simonw/datasette/issues/1439#issuecomment-1045086033)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045095348,https://api.github.com/repos/simonw/datasette/issues/1439,1045095348,IC_kwDOBm6k_c4-SuO0,9599,2022-02-18T19:53:48Z,2022-02-18T19:53:48Z,OWNER,"> Ugh, one disadvantage I just spotted with this: Datasette already has a `/-/versions.json` convention where ""system"" URLs are namespaced under `/-/` - but that could be confused under this new scheme with the `-/` escaping sequence.
>
> And I've thought about adding `/db/-/special` and `/db/table/-/special` URLs in the past too.
I don't think this matters. The new regex does indeed capture that kind of page:
But Datasette goes through configured route regular expressions in order - so I can have the regex that captures `/db/-/special` routes listed before the one that captures tables and formats.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045024276,https://api.github.com/repos/simonw/datasette/issues/1439,1045024276,IC_kwDOBm6k_c4-Sc4U,9599,2022-02-18T19:01:42Z,2022-02-18T19:55:24Z,OWNER,"> Maybe I should use `-/` to encode forward slashes too, to defend against any ASGI servers that might not implement `raw_path` correctly.
```python
def dash_encode(s):
return s.replace(""-"", ""--"").replace(""."", ""-."").replace(""/"", ""-/"")
def dash_decode(s):
return s.replace(""-/"", ""/"").replace(""-."", ""."").replace(""--"", ""-"")
```
```pycon
>>> dash_encode(""foo/bar/baz.csv"")
'foo-/bar-/baz-.csv'
>>> dash_decode('foo-/bar-/baz-.csv')
'foo/bar/baz.csv'
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045099290,https://api.github.com/repos/simonw/datasette/issues/1439,1045099290,IC_kwDOBm6k_c4-SvMa,9599,2022-02-18T19:56:18Z,2022-02-18T19:56:30Z,OWNER,"> ```python
> def dash_encode(s):
> return s.replace(""-"", ""--"").replace(""."", ""-."").replace(""/"", ""-/"")
>
> def dash_decode(s):
> return s.replace(""-/"", ""/"").replace(""-."", ""."").replace(""--"", ""-"")
> ```
I think **dash-encoding** (new name for this) is the right way forward here.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045111309,https://api.github.com/repos/simonw/datasette/issues/1439,1045111309,IC_kwDOBm6k_c4-SyIN,9599,2022-02-18T20:04:24Z,2022-02-18T20:05:40Z,OWNER,"This made me worry that my current `dash_decode()` implementation had unknown round-trip bugs, but thankfully this works OK:
```pycon
>>> dash_encode(""/db/table-.csv.csv"")
'-/db-/table---.csv-.csv'
>>> dash_encode('-/db-/table---.csv-.csv')
'---/db---/table-------.csv---.csv'
>>> dash_decode('---/db---/table-------.csv---.csv')
'-/db-/table---.csv-.csv'
>>> dash_decode('-/db-/table---.csv-.csv')
'/db/table-.csv.csv'
```
The regex still works against that double-encoded example too:
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045108611,https://api.github.com/repos/simonw/datasette/issues/1439,1045108611,IC_kwDOBm6k_c4-SxeD,9599,2022-02-18T20:02:19Z,2022-02-18T20:08:34Z,OWNER,"One other potential variant:
```python
def dash_encode(s):
return s.replace(""-"", ""-dash-"").replace(""."", ""-dot-"").replace(""/"", ""-slash-"")
def dash_decode(s):
return s.replace(""-slash-"", ""/"").replace(""-dot-"", ""."").replace(""-dash-"", ""-"")
```
Except this has bugs - it doesn't round-trip safely, because it can get confused about things like `-dash-slash-` in terms of is that a `-dash-` or a `-slash-`?
```pycon
>>> dash_encode(""/db/table-.csv.csv"")
'-slash-db-slash-table-dash--dot-csv-dot-csv'
>>> dash_decode('-slash-db-slash-table-dash--dot-csv-dot-csv')
'/db/table-.csv.csv'
>>> dash_encode('-slash-db-slash-table-dash--dot-csv-dot-csv')
'-dash-slash-dash-db-dash-slash-dash-table-dash-dash-dash--dash-dot-dash-csv-dash-dot-dash-csv'
>>> dash_decode('-dash-slash-dash-db-dash-slash-dash-table-dash-dash-dash--dash-dot-dash-csv-dash-dot-dash-csv')
'-dash/dash-db-dash/dash-table-dash--dash.dash-csv-dash.dash-csv'
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045117304,https://api.github.com/repos/simonw/datasette/issues/1439,1045117304,IC_kwDOBm6k_c4-Szl4,9599,2022-02-18T20:09:22Z,2022-02-18T20:09:22Z,OWNER,Adopting this could result in supporting database files with surprising characters in their filename too.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045131086,https://api.github.com/repos/simonw/datasette/issues/1439,1045131086,IC_kwDOBm6k_c4-S29O,9599,2022-02-18T20:22:13Z,2022-02-18T20:22:47Z,OWNER,"Should it encode `%` symbols too, since they have a special meaning in URLs and we can't guarantee that every single web server / proxy out there will round-trip them safely using percentage encoding? If so, would need to pick a different encoding character for them. Maybe `%` becomes `-p` - and in that case `/` could become `-s` too.
Is it worth expanding dash-encoding outside of just `/` and `-` and `.` though? Not sure.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045134050,https://api.github.com/repos/simonw/datasette/issues/1439,1045134050,IC_kwDOBm6k_c4-S3ri,9599,2022-02-18T20:25:04Z,2022-02-18T20:25:04Z,OWNER,Here's a useful modern spec for how existing URL percentage encoding is supposed to work: https://url.spec.whatwg.org/#percent-encoded-bytes,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045269544,https://api.github.com/repos/simonw/datasette/issues/1439,1045269544,IC_kwDOBm6k_c4-TYwo,9599,2022-02-18T22:19:29Z,2022-02-18T22:19:29Z,OWNER,"Note that I've ruled out using `Accept: application/json` to return JSON because it turns out Cloudflare and potentially other CDNs ignore the `Vary: Accept` header entirely:
- https://github.com/simonw/datasette/issues/1534","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045069481,https://api.github.com/repos/simonw/datasette/issues/1439,1045069481,IC_kwDOBm6k_c4-Sn6p,9599,2022-02-18T19:34:41Z,2022-03-05T21:32:22Z,OWNER,"I think I got format extraction working! https://regex101.com/r/A0bW1D/1
^/(?P[^/]+)/(?P(?:[^\/\-\.]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)*?)(?:(?\w+))?$
I had to make that crazy inner one even more complicated to stop it from capturing `.` that was not part of `-.`.
(?:[^\/\-\.]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)*
Visualized:
So now I have a regex which can extract out the dot-encoded table name AND spot if there is an optional `.format` at the end:
If I end up using this in Datasette it's going to need VERY comprehensive unit tests and inline documentation.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,