{"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059854864", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059854864, "node_id": "IC_kwDOBm6k_c4_LBoQ", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T23:59:05Z", "updated_at": "2022-03-05T23:59:05Z", "author_association": "OWNER", "body": "OK, for that percentage thing: the Python core implementation of URL percentage escaping deliberately ignores two of the characters we want to escape: `.` and `-`:\r\n\r\nhttps://github.com/python/cpython/blob/6927632492cbad86a250aa006c1847e03b03e70b/Lib/urllib/parse.py#L780-L783\r\n\r\n```python\r\n_ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'\r\n b'abcdefghijklmnopqrstuvwxyz'\r\n b'0123456789'\r\n b'_.-~')\r\n```\r\nIt also defaults to skipping `/` (passed as a `safe=` parameter to various things).\r\n\r\nI'm going to try borrowing and modifying the core of the Python implementation: https://github.com/python/cpython/blob/6927632492cbad86a250aa006c1847e03b03e70b/Lib/urllib/parse.py#L795-L814\r\n```python\r\nclass _Quoter(dict):\r\n \"\"\"A mapping from bytes numbers (in range(0,256)) to strings.\r\n String values are percent-encoded byte values, unless the key < 128, and\r\n in either of the specified safe set, or the always safe set.\r\n \"\"\"\r\n # Keeps a cache internally, via __missing__, for efficiency (lookups\r\n # of cached keys don't call Python code at all).\r\n def __init__(self, safe):\r\n \"\"\"safe: bytes object.\"\"\"\r\n self.safe = _ALWAYS_SAFE.union(safe)\r\n\r\n def __repr__(self):\r\n return f\"\"\r\n\r\n def __missing__(self, b):\r\n # Handle a cache miss. Store quoted string in cache and return.\r\n res = chr(b) if b in self.safe else '%{:02X}'.format(b)\r\n self[b] = res\r\n return res\r\n```", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059853526", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059853526, "node_id": "IC_kwDOBm6k_c4_LBTW", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T23:49:59Z", "updated_at": "2022-03-05T23:49:59Z", "author_association": "OWNER", "body": "I want to try regular percentage encoding, except that it also encodes both the `-` and the `.` characters, AND it uses `-` instead of `%` as the encoding character.\r\n\r\nShould check what it does with emoji too.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059851259", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059851259, "node_id": "IC_kwDOBm6k_c4_LAv7", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T23:35:47Z", "updated_at": "2022-03-05T23:35:59Z", "author_association": "OWNER", "body": "This [comment from glyph](https://twitter.com/glyph/status/1500244937312329730) got me thinking:\r\n\r\n> Have you considered replacing % with some other character and then using percent-encoding?\r\n\r\nWhat happens if a table name includes a `%` character and that ends up getting mangled by a misbehaving proxy?\r\n\r\nI should consider `%` in the escaping system too. And maybe go with that suggestion of using percent-encoding directly but with a different character.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059850369", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059850369, "node_id": "IC_kwDOBm6k_c4_LAiB", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T23:28:56Z", "updated_at": "2022-03-05T23:28:56Z", "author_association": "OWNER", "body": "Lots of great conversations about the dash encoding implementation on Twitter: https://twitter.com/simonw/status/1500228316309061633\r\n\r\n@dracos helped me figure out a simpler regex: https://twitter.com/dracos/status/1500236433809973248\r\n\r\n`^/(?P[^/]+)/(?P[^\\/\\-\\.]*|\\-/|\\-\\.|\\-\\-)*(?P\\.\\w+)?$`\r\n\r\n![image](https://user-images.githubusercontent.com/9599/156903088-c01933ae-4713-4e91-8d71-affebf70b945.png)\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059836599", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059836599, "node_id": "IC_kwDOBm6k_c4_K9K3", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T21:52:10Z", "updated_at": "2022-03-05T21:52:10Z", "author_association": "OWNER", "body": "Blogged about this here: https://simonwillison.net/2022/Mar/5/dash-encoding/", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059822391", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059822391, "node_id": "IC_kwDOBm6k_c4_K5s3", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T19:50:12Z", "updated_at": "2022-03-05T19:50:12Z", "author_association": "OWNER", "body": "I'm going to move this work to a PR.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059822151", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059822151, "node_id": "IC_kwDOBm6k_c4_K5pH", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T19:48:35Z", "updated_at": "2022-03-05T19:48:35Z", "author_association": "OWNER", "body": "Those new docs: https://github.com/simonw/datasette/blob/d1cb73180b4b5a07538380db76298618a5fc46b6/docs/internals.rst#dash-encoding", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/1439#issuecomment-1059802318", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/1439", "id": 1059802318, "node_id": "IC_kwDOBm6k_c4_K0zO", "user": {"value": 9599, "label": "simonw"}, "created_at": "2022-03-05T17:34:33Z", "updated_at": "2022-03-05T17:34:33Z", "author_association": "OWNER", "body": "Wrote documentation:\r\n\r\n\"Dash\r\n\r\n", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 973139047, "label": "Rethink how .ext formats (v.s. ?_format=) works before 1.0"}, "performed_via_github_app": null}