{"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1068461449", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1068461449, "node_id": "IC_kwDOBm6k_c4_r22J", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-15T20:51:26Z", "updated_at": "2022-03-15T20:51:26Z", "author_association": "OWNER", "body": "I'm happy with this now that I've landed Tilde encoding in #1657.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1065988403", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1065988403, "node_id": "IC_kwDOBm6k_c4_ibEz", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-13T00:06:38Z", "updated_at": "2022-03-13T00:07:19Z", "author_association": "OWNER", "body": "If I want to reserve `-` as a character that CAN be used in URLs, the only remaining character that might make sense for escape sequences is `~` - based on this last line of characters that are escape from percentage encoding:\r\n\r\n```python\r\n_ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'\r\n b'abcdefghijklmnopqrstuvwxyz'\r\n b'0123456789'\r\n b'_.-~')\r\n```\r\nSo I'd add both `-` and `_` back to the safe list, but use `~` to escape `.` and `/` and suchlike.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1065987808", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1065987808, "node_id": "IC_kwDOBm6k_c4_ia7g", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-13T00:02:32Z", "updated_at": "2022-03-13T00:02:32Z", "author_association": "OWNER", "body": "OK, this has broken a lot more than I expected it would.\r\n\r\nTurns out `-` is a very common character in existing Datasette database names!\r\n\r\nhttps://datasette.io/-/databases for example has two:\r\n\r\n```json\r\n[\r\n {\r\n \"name\": \"docs-index\",\r\n \"path\": \"docs-index.db\",\r\n \"size\": 1007616,\r\n \"is_mutable\": false,\r\n \"is_memory\": false,\r\n \"hash\": \"0ac6c3de2762fcd174fd249fed8a8fa6046ea345173d22c2766186bf336462b2\"\r\n },\r\n {\r\n \"name\": \"dogsheep-index\",\r\n \"path\": \"dogsheep-index.db\",\r\n \"size\": 5496832,\r\n \"is_mutable\": false,\r\n \"is_memory\": false,\r\n \"hash\": \"d1ea238d204e5b9ae783c86e4af5bcdf21267c1f391de3e468d9665494ee012a\"\r\n }\r\n]\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1060870237", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1060870237, "node_id": "IC_kwDOBm6k_c4_O5hd", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-07T16:19:22Z", "updated_at": "2022-03-07T16:19:22Z", "author_association": "OWNER", "body": "I didn't need to do any of the fancy regular expression routing stuff after all, since the new dash encoding format avoids using `/` so a simple `[^/]+` can capture the correct segments from the URL.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1060044007", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1060044007, "node_id": "IC_kwDOBm6k_c4_Lvzn", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-06T21:38:15Z", "updated_at": "2022-03-06T21:38:15Z", "author_association": "OWNER", "body": "Test: https://github.com/simonw/datasette/blob/d2e3fe3facf0ed0abf8b00cd54463af90dd6904d/tests/test_utils.py#L651-L666\r\n\r\nOne big advantage to this scheme is that redirecting old links to `%2F` pages (e.g. https://fivethirtyeight.datasettes.com/fivethirtyeight/twitter-ratio%2Fsenators) is easy - if you see a `%` in the `raw_path`, redirect to that page with the `%` replaced by `-`.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059903309", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059903309, "node_id": "IC_kwDOBm6k_c4_LNdN", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-06T06:17:51Z", "updated_at": "2022-03-06T06:17:51Z", "author_association": "OWNER", "body": "Suggestion from a conversation with Seth Michael Larson: it would be neat if plugins could easily integrate with whatever scheme this ends up using, maybe with the `/db/table/-/plugin-name` standardized pattern or similar.\r\n\r\nMaking it easy for plugins to do the right, consistent thing is a good idea.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059864154", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059864154, "node_id": "IC_kwDOBm6k_c4_LD5a", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-06T00:59:04Z", "updated_at": "2022-03-06T00:59:04Z", "author_association": "OWNER", "body": "Needs more testing, but this seems to work for decoding the percent-escaped-with-dashes format: `urllib.parse.unquote(s.replace('-', '%'))`", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059863997", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059863997, "node_id": "IC_kwDOBm6k_c4_LD29", "user": {"value": 505230, "label": "karlcow"}, "created_at": "2022-03-06T00:57:57Z", "updated_at": "2022-03-06T00:57:57Z", "author_association": "NONE", "body": "Probably too late\u2026 but I have just seen this because \r\nhttp://simonwillison.net/2022/Mar/5/dash-encoding/#atom-everything\r\n\r\nAnd it reminded me of comma tools at W3C.\r\nhttp://www.w3.org/,tools\r\n\r\nExample, the text version of W3C homepage\r\nhttps://www.w3.org/,text\r\n\r\n\r\n> The challenge comes down to telling the difference between the following:\r\n> \r\n> * `/db/table` - an HTML table page\r\n\r\n`/db/table`\r\n\r\n> * `/db/table.csv` - the CSV version of `/db/table`\r\n\r\n`/db/table,csv`\r\n\r\n> * `/db/table.csv` - no this one is actually a database table called `table.csv`\r\n\r\n`/db/table.csv`\r\n\r\n> * `/db/table.csv.csv` - the CSV version of `/db/table.csv`\r\n\r\n`/db/table.csv,csv`\r\n\r\n> * `/db/table.csv.csv.csv` and so on...\r\n\r\n`/db/table.csv.csv,csv`\r\n\r\n\r\nI haven't checked all the cases in the thread.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059855418", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059855418, "node_id": "IC_kwDOBm6k_c4_LBw6", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-06T00:00:53Z", "updated_at": "2022-03-06T00:04:18Z", "author_association": "OWNER", "body": "```python\r\n_ESCAPE_SAFE = frozenset(\r\n b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'\r\n b'abcdefghijklmnopqrstuvwxyz'\r\n b'0123456789_'\r\n)\r\n# I removed b'.-~')\r\n\r\nclass Quoter(dict):\r\n # Keeps a cache internally, via __missing__\r\n def __missing__(self, b):\r\n # Handle a cache miss. Store quoted string in cache and return.\r\n res = chr(b) if b in _ESCAPE_SAFE else '-{:02X}'.format(b)\r\n self[b] = res\r\n return res\r\n\r\nquoter = Quoter().__getitem__\r\n\r\n''.join([quoter(char) for char in b'foo/bar.csv'])\r\n# 'foo-2Fbar-2Ecsv'\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059854864", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059854864, "node_id": "IC_kwDOBm6k_c4_LBoQ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T23:59:05Z", "updated_at": "2022-03-05T23:59:05Z", "author_association": "OWNER", "body": "OK, for that percentage thing: the Python core implementation of URL percentage escaping deliberately ignores two of the characters we want to escape: `.` and `-`:\r\n\r\nhttps://github.com/python/cpython/blob/6927632492cbad86a250aa006c1847e03b03e70b/Lib/urllib/parse.py#L780-L783\r\n\r\n```python\r\n_ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'\r\n b'abcdefghijklmnopqrstuvwxyz'\r\n b'0123456789'\r\n b'_.-~')\r\n```\r\nIt also defaults to skipping `/` (passed as a `safe=` parameter to various things).\r\n\r\nI'm going to try borrowing and modifying the core of the Python implementation: https://github.com/python/cpython/blob/6927632492cbad86a250aa006c1847e03b03e70b/Lib/urllib/parse.py#L795-L814\r\n```python\r\nclass _Quoter(dict):\r\n \"\"\"A mapping from bytes numbers (in range(0,256)) to strings.\r\n String values are percent-encoded byte values, unless the key < 128, and\r\n in either of the specified safe set, or the always safe set.\r\n \"\"\"\r\n # Keeps a cache internally, via __missing__, for efficiency (lookups\r\n # of cached keys don't call Python code at all).\r\n def __init__(self, safe):\r\n \"\"\"safe: bytes object.\"\"\"\r\n self.safe = _ALWAYS_SAFE.union(safe)\r\n\r\n def __repr__(self):\r\n return f\"\"\r\n\r\n def __missing__(self, b):\r\n # Handle a cache miss. Store quoted string in cache and return.\r\n res = chr(b) if b in self.safe else '%{:02X}'.format(b)\r\n self[b] = res\r\n return res\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059853526", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059853526, "node_id": "IC_kwDOBm6k_c4_LBTW", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T23:49:59Z", "updated_at": "2022-03-05T23:49:59Z", "author_association": "OWNER", "body": "I want to try regular percentage encoding, except that it also encodes both the `-` and the `.` characters, AND it uses `-` instead of `%` as the encoding character.\r\n\r\nShould check what it does with emoji too.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059851259", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059851259, "node_id": "IC_kwDOBm6k_c4_LAv7", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T23:35:47Z", "updated_at": "2022-03-05T23:35:59Z", "author_association": "OWNER", "body": "This [comment from glyph](https://twitter.com/glyph/status/1500244937312329730) got me thinking:\r\n\r\n> Have you considered replacing % with some other character and then using percent-encoding?\r\n\r\nWhat happens if a table name includes a `%` character and that ends up getting mangled by a misbehaving proxy?\r\n\r\nI should consider `%` in the escaping system too. And maybe go with that suggestion of using percent-encoding directly but with a different character.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059850369", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059850369, "node_id": "IC_kwDOBm6k_c4_LAiB", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T23:28:56Z", "updated_at": "2022-03-05T23:28:56Z", "author_association": "OWNER", "body": "Lots of great conversations about the dash encoding implementation on Twitter: https://twitter.com/simonw/status/1500228316309061633\r\n\r\n@dracos helped me figure out a simpler regex: https://twitter.com/dracos/status/1500236433809973248\r\n\r\n`^/(?P[^/]+)/(?P[^\\/\\-\\.]*|\\-/|\\-\\.|\\-\\-)*(?P\\.\\w+)?$`\r\n\r\n![image](https://user-images.githubusercontent.com/9599/156903088-c01933ae-4713-4e91-8d71-affebf70b945.png)\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059836599", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059836599, "node_id": "IC_kwDOBm6k_c4_K9K3", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T21:52:10Z", "updated_at": "2022-03-05T21:52:10Z", "author_association": "OWNER", "body": "Blogged about this here: https://simonwillison.net/2022/Mar/5/dash-encoding/", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045069481", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045069481, "node_id": "IC_kwDOBm6k_c4-Sn6p", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:34:41Z", "updated_at": "2022-03-05T21:32:22Z", "author_association": "OWNER", "body": "I think I got format extraction working! https://regex101.com/r/A0bW1D/1\r\n\r\n ^/(?P[^/]+)/(?P
(?:[^\\/\\-\\.]*|(?:\\-/)*|(?:\\-\\.)*|(?:\\-\\-)*)*?)(?:(?\\w+))?$\r\n\r\nI had to make that crazy inner one even more complicated to stop it from capturing `.` that was not part of `-.`.\r\n\r\n (?:[^\\/\\-\\.]*|(?:\\-/)*|(?:\\-\\.)*|(?:\\-\\-)*)*\r\n\r\nVisualized:\r\n\r\n\"image\"\r\n\r\nSo now I have a regex which can extract out the dot-encoded table name AND spot if there is an optional `.format` at the end:\r\n\r\n\"image\"\r\n\r\nIf I end up using this in Datasette it's going to need VERY comprehensive unit tests and inline documentation.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059822391", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059822391, "node_id": "IC_kwDOBm6k_c4_K5s3", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T19:50:12Z", "updated_at": "2022-03-05T19:50:12Z", "author_association": "OWNER", "body": "I'm going to move this work to a PR.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059822151", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059822151, "node_id": "IC_kwDOBm6k_c4_K5pH", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T19:48:35Z", "updated_at": "2022-03-05T19:48:35Z", "author_association": "OWNER", "body": "Those new docs: https://github.com/simonw/datasette/blob/d1cb73180b4b5a07538380db76298618a5fc46b6/docs/internals.rst#dash-encoding", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059802318", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059802318, "node_id": "IC_kwDOBm6k_c4_K0zO", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T17:34:33Z", "updated_at": "2022-03-05T17:34:33Z", "author_association": "OWNER", "body": "Wrote documentation:\r\n\r\n\"Dash\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1053973425", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1053973425, "node_id": "IC_kwDOBm6k_c4-0lux", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-28T07:40:12Z", "updated_at": "2022-02-28T07:40:12Z", "author_association": "OWNER", "body": "If I make this change it will break existing links to one of the oldest Datasette demos: http://fivethirtyeight.datasettes.com/fivethirtyeight/avengers%2Favengers\r\n\r\nA plugin that fixes those by redirecting them on 404 would be neat.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1049126151", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1049126151, "node_id": "IC_kwDOBm6k_c4-iGUH", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-23T19:17:01Z", "updated_at": "2022-02-23T19:17:01Z", "author_association": "OWNER", "body": "Actually the relevant code looks to be: https://github.com/simonw/datasette/blob/7d24fd405f3c60e4c852c5d746c91aa2ba23cf5b/datasette/views/base.py#L481-L498", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1049124390", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1049124390, "node_id": "IC_kwDOBm6k_c4-iF4m", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-23T19:15:00Z", "updated_at": "2022-02-23T19:15:00Z", "author_association": "OWNER", "body": "I'll start by modifying this function: https://github.com/simonw/datasette/blob/458f03ad3a454d271f47a643f4530bd8b60ddb76/datasette/utils/__init__.py#L732-L749\r\n\r\nLater I want to move this to the routing layer to split out `format` automatically, as seen in the regexes here: https://github.com/simonw/datasette/issues/1439#issuecomment-1045069481", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1049114724", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1049114724, "node_id": "IC_kwDOBm6k_c4-iDhk", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-23T19:04:40Z", "updated_at": "2022-02-23T19:04:40Z", "author_association": "OWNER", "body": "I'm going to try dash encoding for table names (and row IDs) in a branch and see how I like it.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045269544", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045269544, "node_id": "IC_kwDOBm6k_c4-TYwo", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T22:19:29Z", "updated_at": "2022-02-18T22:19:29Z", "author_association": "OWNER", "body": "Note that I've ruled out using `Accept: application/json` to return JSON because it turns out Cloudflare and potentially other CDNs ignore the `Vary: Accept` header entirely:\r\n- https://github.com/simonw/datasette/issues/1534", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045134050", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045134050, "node_id": "IC_kwDOBm6k_c4-S3ri", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T20:25:04Z", "updated_at": "2022-02-18T20:25:04Z", "author_association": "OWNER", "body": "Here's a useful modern spec for how existing URL percentage encoding is supposed to work: https://url.spec.whatwg.org/#percent-encoded-bytes", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045131086", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045131086, "node_id": "IC_kwDOBm6k_c4-S29O", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T20:22:13Z", "updated_at": "2022-02-18T20:22:47Z", "author_association": "OWNER", "body": "Should it encode `%` symbols too, since they have a special meaning in URLs and we can't guarantee that every single web server / proxy out there will round-trip them safely using percentage encoding? If so, would need to pick a different encoding character for them. Maybe `%` becomes `-p` - and in that case `/` could become `-s` too.\r\n\r\nIs it worth expanding dash-encoding outside of just `/` and `-` and `.` though? Not sure.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045117304", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045117304, "node_id": "IC_kwDOBm6k_c4-Szl4", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T20:09:22Z", "updated_at": "2022-02-18T20:09:22Z", "author_association": "OWNER", "body": "Adopting this could result in supporting database files with surprising characters in their filename too.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045108611", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045108611, "node_id": "IC_kwDOBm6k_c4-SxeD", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T20:02:19Z", "updated_at": "2022-02-18T20:08:34Z", "author_association": "OWNER", "body": "One other potential variant:\r\n```python\r\ndef dash_encode(s):\r\n return s.replace(\"-\", \"-dash-\").replace(\".\", \"-dot-\").replace(\"/\", \"-slash-\")\r\n\r\ndef dash_decode(s):\r\n return s.replace(\"-slash-\", \"/\").replace(\"-dot-\", \".\").replace(\"-dash-\", \"-\")\r\n```\r\nExcept this has bugs - it doesn't round-trip safely, because it can get confused about things like `-dash-slash-` in terms of is that a `-dash-` or a `-slash-`?\r\n```pycon\r\n>>> dash_encode(\"/db/table-.csv.csv\")\r\n'-slash-db-slash-table-dash--dot-csv-dot-csv'\r\n>>> dash_decode('-slash-db-slash-table-dash--dot-csv-dot-csv')\r\n'/db/table-.csv.csv'\r\n>>> dash_encode('-slash-db-slash-table-dash--dot-csv-dot-csv')\r\n'-dash-slash-dash-db-dash-slash-dash-table-dash-dash-dash--dash-dot-dash-csv-dash-dot-dash-csv'\r\n>>> dash_decode('-dash-slash-dash-db-dash-slash-dash-table-dash-dash-dash--dash-dot-dash-csv-dash-dot-dash-csv')\r\n'-dash/dash-db-dash/dash-table-dash--dash.dash-csv-dash.dash-csv'\r\n```\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045111309", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045111309, "node_id": "IC_kwDOBm6k_c4-SyIN", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T20:04:24Z", "updated_at": "2022-02-18T20:05:40Z", "author_association": "OWNER", "body": "This made me worry that my current `dash_decode()` implementation had unknown round-trip bugs, but thankfully this works OK:\r\n```pycon\r\n>>> dash_encode(\"/db/table-.csv.csv\")\r\n'-/db-/table---.csv-.csv'\r\n>>> dash_encode('-/db-/table---.csv-.csv')\r\n'---/db---/table-------.csv---.csv'\r\n>>> dash_decode('---/db---/table-------.csv---.csv')\r\n'-/db-/table---.csv-.csv'\r\n>>> dash_decode('-/db-/table---.csv-.csv')\r\n'/db/table-.csv.csv'\r\n``` \r\nThe regex still works against that double-encoded example too:\r\n\r\n\"image\"\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045099290", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045099290, "node_id": "IC_kwDOBm6k_c4-SvMa", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:56:18Z", "updated_at": "2022-02-18T19:56:30Z", "author_association": "OWNER", "body": "> ```python\r\n> def dash_encode(s):\r\n> return s.replace(\"-\", \"--\").replace(\".\", \"-.\").replace(\"/\", \"-/\")\r\n> \r\n> def dash_decode(s):\r\n> return s.replace(\"-/\", \"/\").replace(\"-.\", \".\").replace(\"--\", \"-\")\r\n> ```\r\n\r\nI think **dash-encoding** (new name for this) is the right way forward here.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045024276", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045024276, "node_id": "IC_kwDOBm6k_c4-Sc4U", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:01:42Z", "updated_at": "2022-02-18T19:55:24Z", "author_association": "OWNER", "body": "> Maybe I should use `-/` to encode forward slashes too, to defend against any ASGI servers that might not implement `raw_path` correctly.\r\n```python\r\ndef dash_encode(s):\r\n return s.replace(\"-\", \"--\").replace(\".\", \"-.\").replace(\"/\", \"-/\")\r\n\r\ndef dash_decode(s):\r\n return s.replace(\"-/\", \"/\").replace(\"-.\", \".\").replace(\"--\", \"-\")\r\n```\r\n\r\n```pycon\r\n>>> dash_encode(\"foo/bar/baz.csv\")\r\n'foo-/bar-/baz-.csv'\r\n>>> dash_decode('foo-/bar-/baz-.csv')\r\n'foo/bar/baz.csv'\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045095348", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045095348, "node_id": "IC_kwDOBm6k_c4-SuO0", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:53:48Z", "updated_at": "2022-02-18T19:53:48Z", "author_association": "OWNER", "body": "> Ugh, one disadvantage I just spotted with this: Datasette already has a `/-/versions.json` convention where \"system\" URLs are namespaced under `/-/` - but that could be confused under this new scheme with the `-/` escaping sequence.\r\n> \r\n> And I've thought about adding `/db/-/special` and `/db/table/-/special` URLs in the past too.\r\n\r\nI don't think this matters. The new regex does indeed capture that kind of page:\r\n\r\n\"image\"\r\n\r\nBut Datasette goes through configured route regular expressions in order - so I can have the regex that captures `/db/-/special` routes listed before the one that captures tables and formats.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045081042", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045081042, "node_id": "IC_kwDOBm6k_c4-SqvS", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:44:12Z", "updated_at": "2022-02-18T19:51:34Z", "author_association": "OWNER", "body": "```python\r\ndef dot_encode(s):\r\n return s.replace(\".\", \"..\").replace(\"/\", \"./\")\r\n\r\ndef dot_decode(s):\r\n return s.replace(\"./\", \"/\").replace(\"..\", \".\")\r\n```\r\nNo need for hyphen encoding in this variant at all, which simplifies things a bit.\r\n\r\n(Update: this is flawed, see https://github.com/simonw/datasette/issues/1439#issuecomment-1045086033)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045086033", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045086033, "node_id": "IC_kwDOBm6k_c4-Sr9R", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:47:43Z", "updated_at": "2022-02-18T19:51:11Z", "author_association": "OWNER", "body": "- https://datasette.io/-/asgi-scope/db/./db./table-..csv..csv\r\n- https://til.simonwillison.net/-/asgi-scope/db/./db./table-..csv..csv\r\n\r\nDo both of those survive the round-trip to populate `raw_path` correctly?\r\n\r\nNo! In both cases the `/./` bit goes missing.\r\n\r\nIt looks like this might even be a client issue - `curl` shows me this:\r\n\r\n```\r\n~ % curl -vv -i 'https://datasette.io/-/asgi-scope/db/./db./table-..csv..csv'\r\n* Trying 216.239.32.21:443...\r\n* Connected to datasette.io (216.239.32.21) port 443 (#0)\r\n* ALPN, offering http/1.1\r\n* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256\r\n* Server certificate: datasette.io\r\n* Server certificate: R3\r\n* Server certificate: ISRG Root X1\r\n> GET /-/asgi-scope/db/db./table-..csv..csv HTTP/1.1\r\n```\r\nSo `curl` decided to turn `/-/asgi-scope/db/./db./table` into `/-/asgi-scope/db/db./table` before even sending the request.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045082891", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045082891, "node_id": "IC_kwDOBm6k_c4-SrML", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:45:32Z", "updated_at": "2022-02-18T19:45:32Z", "author_association": "OWNER", "body": "```pycon\r\n>>> dot_encode(\"/db/table-.csv.csv\")\r\n'./db./table-..csv..csv'\r\n>>> dot_decode('./db./table-..csv..csv')\r\n'/db/table-.csv.csv'\r\n```\r\nI worry that web servers might treat `./` in a special way though.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045077590", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045077590, "node_id": "IC_kwDOBm6k_c4-Sp5W", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:41:37Z", "updated_at": "2022-02-18T19:42:41Z", "author_association": "OWNER", "body": "Ugh, one disadvantage I just spotted with this: Datasette already has a `/-/versions.json` convention where \"system\" URLs are namespaced under `/-/` - but that could be confused under this new scheme with the `-/` escaping sequence.\r\n\r\nAnd I've thought about adding `/db/-/special` and `/db/table/-/special` URLs in the past too.\r\n\r\nMaybe change this system to use `.` as the escaping character instead of `-`?", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045075207", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045075207, "node_id": "IC_kwDOBm6k_c4-SpUH", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:39:35Z", "updated_at": "2022-02-18T19:40:13Z", "author_association": "OWNER", "body": "> And if for some horific reason you had a table with the name `/db/table-.csv.csv` (so `/db/` was the first part of the actual table name in SQLite) the URLs would look like this:\r\n> \r\n> * `/db/%2Fdb%2Ftable---.csv-.csv` - the HTML version\r\n> * `/db/%2Fdb%2Ftable---.csv-.csv.csv` - the CSV version\r\n> * `/db/%2Fdb%2Ftable---.csv-.csv.json` - the JSON version\r\n\r\nHere's what those look like with the updated version of `dot_dash_encode()` that also encodes `/` as `-/`:\r\n\r\n- `/db/-/db-/table---.csv-.csv` - HTML\r\n- `/db/-/db-/table---.csv-.csv.csv` - CSV\r\n- `/db/-/db-/table---.csv-.csv.json` - JSON\r\n\r\n\"image\"\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045059427", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045059427, "node_id": "IC_kwDOBm6k_c4-Sldj", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:26:25Z", "updated_at": "2022-02-18T19:26:25Z", "author_association": "OWNER", "body": "With this new pattern I could probably extract out the optional `.json` format string as part of the initial route capturing regex too, rather than the current `table_and_format` hack.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045055772", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045055772, "node_id": "IC_kwDOBm6k_c4-Skkc", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:23:33Z", "updated_at": "2022-02-18T19:25:42Z", "author_association": "OWNER", "body": "I want a match for this URL:\r\n\r\n /db/table-/with-/slashes-.csv\r\n\r\nMaybe this:\r\n\r\n ^/(?P[^/]+)/(?P([^/]*|(\\-/)*|(\\-\\.)*|(\\.\\.)*)*$)\r\n\r\nHere we are matching a sequence of:\r\n\r\n ([^/]*|(\\-/)*|(\\-\\.)*|(\\-\\-)*)*\r\n\r\nSo a combination of not-slashes OR -/ or -. Or -- sequences\r\n\r\n\"image\"\r\n\r\n ^/(?P[^/]+)/(?P([^/]*|(\\-/)*|(\\-\\.)*|(\\-\\-)*)*$)\r\n\r\nTry that with non-capturing bits:\r\n\r\n ^/(?P[^/]+)/(?P(?:[^/]*|(?:\\-/)*|(?:\\-\\.)*|(?:\\-\\-)*)*$)\r\n\r\n`(?:[^/]*|(?:\\-/)*|(?:\\-\\.)*|(?:\\-\\-)*)*` visualized is:\r\n\r\n\"image\"\r\n\r\nHere's the explanation on regex101.com https://regex101.com/r/CPnsIO/1\r\n\r\n\"image\"\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045032377", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045032377, "node_id": "IC_kwDOBm6k_c4-Se25", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:06:50Z", "updated_at": "2022-02-18T19:06:50Z", "author_association": "OWNER", "body": "How does URL routing for https://latest.datasette.io/fixtures/table%2Fwith%2Fslashes.csv work?\r\n\r\nRight now it's https://github.com/simonw/datasette/blob/7d24fd405f3c60e4c852c5d746c91aa2ba23cf5b/datasette/app.py#L1098-L1101\r\n\r\nThat's not going to capture the dot-dash encoding version of that table name:\r\n```pycon\r\n>>> dot_dash_encode(\"table/with/slashes.csv\")\r\n'table-/with-/slashes-.csv'\r\n```\r\nProbably needs a fancy regex trick like a negative lookbehind assertion or similar.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1045027067", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1045027067, "node_id": "IC_kwDOBm6k_c4-Sdj7", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-18T19:03:26Z", "updated_at": "2022-02-18T19:03:26Z", "author_association": "OWNER", "body": "(If I make this change it may break some existing Datasette installations when they upgrade - I could try and build a plugin for them which triggers on 404s and checks to see if the old format would return a 200 response, then returns that.)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1031141849", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1031141849, "node_id": "IC_kwDOBm6k_c49dfnZ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-02-07T07:11:11Z", "updated_at": "2022-02-07T07:11:11Z", "author_association": "OWNER", "body": "I added a Link header to solve this problem for the JSON version in:\r\n- #1533 ", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-900715375", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 900715375, "node_id": "IC_kwDOBm6k_c41r9Nv", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-18T00:15:28Z", "updated_at": "2021-08-18T00:15:28Z", "author_association": "OWNER", "body": "Maybe I should use `-/` to encode forward slashes too, to defend against any ASGI servers that might not implement `raw_path` correctly.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-900714630", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 900714630, "node_id": "IC_kwDOBm6k_c41r9CG", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-18T00:13:33Z", "updated_at": "2021-08-18T00:13:33Z", "author_association": "OWNER", "body": "The documentation should definitely cover how table names become URLs, in case any third party code needs to be able to calculate this themselves.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-900712981", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 900712981, "node_id": "IC_kwDOBm6k_c41r8oV", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-18T00:09:59Z", "updated_at": "2021-08-18T00:12:32Z", "author_association": "OWNER", "body": "So given the original examples, a table called `table.csv` would have the following URLs:\r\n\r\n- `/db/table-.csv` - the HTML version\r\n- `/db/table-.csv.csv` - the CSV version\r\n- `/db/table-.csv.json` - the JSON version\r\n\r\nAnd if for some horific reason you had a table with the name `/db/table-.csv.csv` (so `/db/` was the first part of the actual table name in SQLite) the URLs would look like this:\r\n\r\n- `/db/%2Fdb%2Ftable---.csv-.csv` - the HTML version\r\n- `/db/%2Fdb%2Ftable---.csv-.csv.csv` - the CSV version\r\n- `/db/%2Fdb%2Ftable---.csv-.csv.json` - the JSON version", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-900711967", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 900711967, "node_id": "IC_kwDOBm6k_c41r8Yf", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-18T00:08:09Z", "updated_at": "2021-08-18T00:08:09Z", "author_association": "OWNER", "body": "Here's an alternative I just made up which I'm calling \"dot dash\" encoding:\r\n\r\n```python\r\ndef dot_dash_encode(s):\r\n return s.replace(\"-\", \"--\").replace(\".\", \"-.\")\r\n\r\ndef dot_dash_decode(s):\r\n return s.replace(\"-.\", \".\").replace(\"--\", \"-\")\r\n```\r\nAnd some examples:\r\n```python\r\nfor example in (\r\n \"hello\",\r\n \"hello.csv\",\r\n \"hello-and-so-on.csv\",\r\n \"hello-.csv\",\r\n \"hello--and--so--on-.csv\",\r\n \"hello.csv.\",\r\n \"hello.csv.-\",\r\n \"hello.csv.--\",\r\n):\r\n print(example)\r\n print(dot_dash_encode(example))\r\n print(example == dot_dash_decode(dot_dash_encode(example)))\r\n print()\r\n```\r\nOutputs:\r\n```\r\nhello\r\nhello\r\nTrue\r\n\r\nhello.csv\r\nhello-.csv\r\nTrue\r\n\r\nhello-and-so-on.csv\r\nhello--and--so--on-.csv\r\nTrue\r\n\r\nhello-.csv\r\nhello---.csv\r\nTrue\r\n\r\nhello--and--so--on-.csv\r\nhello----and----so----on---.csv\r\nTrue\r\n\r\nhello.csv.\r\nhello-.csv-.\r\nTrue\r\n\r\nhello.csv.-\r\nhello-.csv-.--\r\nTrue\r\n\r\nhello.csv.--\r\nhello-.csv-.----\r\nTrue\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-900709703", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 900709703, "node_id": "IC_kwDOBm6k_c41r71H", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-18T00:03:09Z", "updated_at": "2021-08-18T00:03:09Z", "author_association": "OWNER", "body": "But... what if I invent my own escaping scheme?\r\n\r\nI actually did this once before, in https://github.com/simonw/datasette/commit/9fdb47ca952b93b7b60adddb965ea6642b1ff523 - while I was working on porting Datasette to ASGI in https://github.com/simonw/datasette/issues/272#issuecomment-494192779 because ASGI didn't yet have the `raw_path` mechanism.\r\n\r\nI could bring that back - it looked like this:\r\n\r\n```\r\n \"table/and/slashes\" => \"tableU+002FandU+002Fslashes\"\r\n \"~table\" => \"U+007Etable\"\r\n \"+bobcats!\" => \"U+002Bbobcats!\"\r\n \"U+007Etable\" => \"UU+002B007Etable\"\r\n```\r\nBut I didn't particularly like it - it was quite verbose.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-900705226", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 900705226, "node_id": "IC_kwDOBm6k_c41r6vK", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-17T23:50:32Z", "updated_at": "2021-08-17T23:50:47Z", "author_association": "OWNER", "body": "An alternative solution would be to use some form of escaping for the characters that form the name of the table.\r\n\r\nThe obvious way to do this would be URL-encoding - but it doesn't hold for `.` characters. The hex for that is `%2E` but watch what happens with that in a URL:\r\n\r\n```\r\n# Against Cloud Run:\r\ncurl -s 'https://datasette.io/-/asgi-scope/foo/bar%2Fbaz%2E' | rg path\r\n 'path': '/-/asgi-scope/foo/bar/baz.',\r\n 'raw_path': b'/-/asgi-scope/foo/bar%2Fbaz.',\r\n 'root_path': '',\r\n# Against Vercel:\r\ncurl -s 'https://til.simonwillison.net/-/asgi-scope/foo/bar%2Fbaz%2E' | rg path\r\n 'path': '/-/asgi-scope/foo/bar%2Fbaz%2E',\r\n 'raw_path': b'/-/asgi-scope/foo/bar%2Fbaz%2E',\r\n 'root_path': '',\r\n```\r\nSurprisingly in this case Vercel DOES keep it intact, but Cloud Run does not.\r\n\r\nIt's still no good though: I need a solution that works on Vercel, Cloud Run and every other potential hosting provider too.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-900699670", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 900699670, "node_id": "IC_kwDOBm6k_c41r5YW", "user": {"value": 9599, "label": "simonw"}, "created_at": "2021-08-17T23:34:23Z", "updated_at": "2021-08-17T23:34:23Z", "author_association": "OWNER", "body": "The challenge comes down to telling the difference between the following:\r\n\r\n- `/db/table` - an HTML table page\r\n- `/db/table.csv` - the CSV version of `/db/table`\r\n- `/db/table.csv` - no this one is actually a database table called `table.csv`\r\n- `/db/table.csv.csv` - the CSV version of `/db/table.csv`\r\n- `/db/table.csv.csv.csv` and so on...", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null}