19 rows where issue = 657572753 sorted by updated_at descending

View and edit SQL

Suggested facets: created_at (date), updated_at (date)

user

issue

  • ?sort=colname~numeric to sort by by column cast to real · 19

author_association

id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
709575818 https://github.com/simonw/datasette/issues/894#issuecomment-709575818 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTU3NTgxOA== simonw 9599 2020-10-15T20:35:03Z 2020-10-15T20:35:03Z OWNER

Prototype so far:

diff --git a/datasette/views/table.py b/datasette/views/table.py
index ea11a51..d61f8bd 100644
--- a/datasette/views/table.py
+++ b/datasette/views/table.py
@@ -497,17 +497,32 @@ class TableView(RowTableShared):
         if sort and sort_desc:
             raise DatasetteError("Cannot use _sort and _sort_desc at the same time")

+        def parse_sort(sort):
+            if "~" in sort:
+                if sort.endswith("~default"):
+                    col = sort.rsplit("~", 1)[0]
+                    return col, escape_sqlite(col)
+                elif sort.endswith("~numeric"):
+                    col = sort.rsplit("~", 1)[0]
+                    return col, "cast(nullif({}, '') as real)".format(escape_sqlite(col))
+                else:
+                    return sort, escape_sqlite(sort)
+            else:
+                return sort, escape_sqlite(sort)
+
         if sort:
-            if sort not in sortable_columns:
-                raise DatasetteError("Cannot sort table by {}".format(sort))
+            sort_column, sort_clause = parse_sort(sort)
+            if sort_column not in sortable_columns:
+                raise DatasetteError("Cannot sort table by {}".format(sort_column))

-            order_by = escape_sqlite(sort)
+            order_by = sort_clause

         if sort_desc:
-            if sort_desc not in sortable_columns:
-                raise DatasetteError("Cannot sort table by {}".format(sort_desc))
+            sort_column, sort_clause = parse_sort(sort_desc)
+            if sort_column not in sortable_columns:
+                raise DatasetteError("Cannot sort table by {}".format(sort_column))

-            order_by = "{} desc".format(escape_sqlite(sort_desc))
+            order_by = "{} desc".format(sort_clause)

         from_sql = "from {table_name} {where}".format(
             table_name=escape_sqlite(table),
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709572425 https://github.com/simonw/datasette/issues/894#issuecomment-709572425 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTU3MjQyNQ== simonw 9599 2020-10-15T20:28:18Z 2020-10-15T20:28:18Z OWNER

Also need to rethink this template logic that decides if to show a column as sorted or not:
https://github.com/simonw/datasette/blob/4f7c0ebd85ccd8c1853d7aa0147628f7c1b749cc/datasette/templates/_table.html#L10-L14

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709571143 https://github.com/simonw/datasette/issues/894#issuecomment-709571143 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTU3MTE0Mw== simonw 9599 2020-10-15T20:25:35Z 2020-10-15T20:25:35Z OWNER

cast(nullif(colname, '') as real) can fix this - it will treat '' the same as null.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709569951 https://github.com/simonw/datasette/issues/894#issuecomment-709569951 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTU2OTk1MQ== simonw 9599 2020-10-15T20:23:02Z 2020-10-15T20:23:02Z OWNER

Something to watch out for: "" empty strings cast to 0.0:

select cast("100" as real), "100", cast(null as real), cast("" as real)

<table> <thead> <tr> <th>cast("100" as real)</th> <th>"100"</th> <th>cast(null as real)</th> <th>cast("" as real)</th> </tr> </thead> <tbody> <tr> <td>100.0</td> <td>100</td> <td> </td> <td>0.0</td> </tr> </tbody> </table>

https://latest.datasette.io/fixtures?sql=select+cast%28%22100%22+as+real%29%2C+%22100%22%2C+cast%28null+as+real%29%2C+cast%28%22%22+as+real%29

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709562940 https://github.com/simonw/datasette/issues/894#issuecomment-709562940 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTU2Mjk0MA== simonw 9599 2020-10-15T20:08:16Z 2020-10-15T20:08:16Z OWNER
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709546976 https://github.com/simonw/datasette/issues/894#issuecomment-709546976 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTU0Njk3Ng== simonw 9599 2020-10-15T19:35:55Z 2020-10-15T19:36:38Z OWNER

Much easier solution: if the suffix is ~numeric then treat it as the column name sorted numerically. If the suffix is missing OR the suffix is ~default, sort without casting. Only add the ~default suffix if the column name itself contains at least one ~ symbol.

Using ~ because it doesn't need to be URL-encoded.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709539257 https://github.com/simonw/datasette/issues/894#issuecomment-709539257 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTUzOTI1Nw== simonw 9599 2020-10-15T19:19:29Z 2020-10-15T19:34:07Z OWNER

Urgh this isn't going to work. %7E~%7E gets decoded as ~~~ so I wouldn't be able to tell the difference.

I could use double-percentage-encoding here instead. I feel like there's a simpler solution that I'm missing (and that may well be in use within Datasette already, I'm not doing great thinking this morning).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709534197 https://github.com/simonw/datasette/issues/894#issuecomment-709534197 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTUzNDE5Nw== simonw 9599 2020-10-15T19:08:53Z 2020-10-15T19:17:55Z OWNER

Even better solution: use URL encoding in the parameter details. This is consistent with how ?_next= tokens work, e.g. ?_next=0.291861560261786%2Ce%2Cj.

So the format can be:

  • mycolumn
  • urlencoded-mycolumn$castname

For most columns this will look like: ?_sort=score$numeric

For columns with a $ in their name it will be ?_sort=score%24hasdollar$numeric

Problem: both $ and , are usually URL encoded anyway. I need a character which isn't encoded by default, so that I can use its encoded form to show it is part of the column name and its un-encoded form to split the cast indicator.

_ is a candidate here - not encoded by default, but can be encoded as %5F.

The other unreserved non-alphanumeric characters are -, ., _, ~.

Of these, ~ is least likely to show up in a column name. So I'll use that.

  • mycolumn
  • mycolumn~numeric
  • mycolumn%7Ewith%7Etildes~numeric
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709532369 https://github.com/simonw/datasette/issues/894#issuecomment-709532369 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTUzMjM2OQ== simonw 9599 2020-10-15T19:05:07Z 2020-10-15T19:07:35Z OWNER

Simpler option: ?_sort= column values look like this:

  • mycolumn - for sort by column
  • mycolumn$numeric - for sort by column after cast to float
  • mycolumn$morename$default - for the edge case where the column name itself contains a $ symbol
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709531343 https://github.com/simonw/datasette/issues/894#issuecomment-709531343 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTUzMTM0Mw== simonw 9599 2020-10-15T19:03:12Z 2020-10-15T19:03:12Z OWNER

The Sort by <select> menu needs a rethink: https://github.com/simonw/datasette/blob/4f7c0ebd85ccd8c1853d7aa0147628f7c1b749cc/datasette/templates/table.html#L80-L87

If it's going to include sort by numeric options it needs a different format - since ?_sort=colname needs to also support ?_sort=numeric_colname - but shouldn't clash with a weird column already named numeric_colname.

Can I come up with a value syntax for this that is guaranteed not to clash with a weirdly named existing column?

I think so. I could use {"type": "numeric", "column": "mycolumn"} as the value - then if a column has a name that is itself valid JSON (weird but possible) the _sort= value would be {"type": "default", "column": "{\"type\": \"numeric\", \"name\": \"column\"}"}.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709525082 https://github.com/simonw/datasette/issues/894#issuecomment-709525082 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTUyNTA4Mg== simonw 9599 2020-10-15T18:51:59Z 2020-10-15T18:51:59Z OWNER

This is enough of a design to build a working prototype.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709505147 https://github.com/simonw/datasette/issues/894#issuecomment-709505147 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTUwNTE0Nw== simonw 9599 2020-10-15T18:18:45Z 2020-10-15T18:50:50Z OWNER

This needs querystring parameter design. Some options:

  • ?_sort_clause=cast(mycol+as+integer) and ?_sort_clause_desc=cast(mycol+as+integer) - allowing any expression. This would need to be disabled if arbitrary SQL was turned off, similar to the restrictions on ?_where=.
  • ?_sort_numeric=mycol and ?_sort_numeric_desc=mycol - this would cast to real which would work on integer values as well. It could be allowed even when arbitrary SQL was disabled.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709524123 https://github.com/simonw/datasette/issues/894#issuecomment-709524123 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTUyNDEyMw== simonw 9599 2020-10-15T18:50:25Z 2020-10-15T18:50:25Z OWNER

For the "Sort by X" select menu case... I could automatically expand that menu to contain extra options for "Sort numerically by X" for each TEXT column in the table. That's a pretty good option.

For the action cog menu, I can add the extra options to the cog menu - and rely on the fact that the title of the page will say "Sorted numerically by colname descending".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709513483 https://github.com/simonw/datasette/issues/894#issuecomment-709513483 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTUxMzQ4Mw== simonw 9599 2020-10-15T18:31:56Z 2020-10-15T18:31:56Z OWNER

I think the first version of this feature involves implementing ?_sort_numeric=col and ?_sort_numeric_desc=col plus the JavaScript to detect if those values should be shown in the column actions menu.

One question: how to reflect that this is happening in the current sort UI. This menu here for example:

https://user-images.githubusercontent.com/9599/96171519-e35ac180-0ed9-11eb-9945-6780054e4ad5.png">

And this interface: how should it indicate that a text is currently sorted numerically v.s. sorted alphabetically, and allow the user to switch from one to the other?

https://user-images.githubusercontent.com/9599/96171618-04bbad80-0eda-11eb-946c-ed3b17a8c749.png">

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709511399 https://github.com/simonw/datasette/issues/894#issuecomment-709511399 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTUxMTM5OQ== simonw 9599 2020-10-15T18:28:09Z 2020-10-15T18:28:09Z OWNER

The simplest solution would be for Python code to scan all of the visible values on the current page and show the column action for "sort by this numeric" based purely on that. I already do that in the JavaScript for "are there any blank values in the first page?" here:

https://github.com/simonw/datasette/blob/4f7c0ebd85ccd8c1853d7aa0147628f7c1b749cc/datasette/static/table.js#L106-L118

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709510422 https://github.com/simonw/datasette/issues/894#issuecomment-709510422 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTUxMDQyMg== simonw 9599 2020-10-15T18:26:25Z 2020-10-15T18:26:25Z OWNER

There's something interesting about figuring out which sort options should be offered in the column actions menu.

Two options:

  • Try to detect if a text column is all-integers or all-floats, either by scanning the entire table (if it's small enough) or by scanning the first X rows, where X might be the size of the first page or maybe the first 1,000 or similar.
  • Could also let users define this in metadata.yml for the table.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709509635 https://github.com/simonw/datasette/issues/894#issuecomment-709509635 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTUwOTYzNQ== simonw 9599 2020-10-15T18:24:57Z 2020-10-15T18:24:57Z OWNER

This does feel like a weird plugin hook just because there aren't really THAT many different use-cases that plugins could solve. The ones I can think of are:

  • Sort numeric
  • Sort by parsed date
  • Sort by FTS rank

Could this work if I just allow _sort_clause=?

One possible solution for the no-arbitrary-SQL case: users can define sort orders in metadata.json/yml. So if you want to enable sort-by-distance without enabling arbitrary SQL you could add something like this:

databases:
  mydb:
    tables:
      museums:
        sort_clause: bm25(fts)
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709505953 https://github.com/simonw/datasette/issues/894#issuecomment-709505953 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTUwNTk1Mw== simonw 9599 2020-10-15T18:20:13Z 2020-10-15T18:20:13Z OWNER

Sorting by date when the column has a junk date format in it is such a column need it should maybe ship in Datasette by default - though I've been trying to avoid adding heavy dependencies like dateutil if I can get away with it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  
709505608 https://github.com/simonw/datasette/issues/894#issuecomment-709505608 https://api.github.com/repos/simonw/datasette/issues/894 MDEyOklzc3VlQ29tbWVudDcwOTUwNTYwOA== simonw 9599 2020-10-15T18:19:35Z 2020-10-15T18:19:35Z OWNER

I could even let plugins define new sort types. Imagine a plugin that enables this:

?_sort_date_desc=mycol - where it knows how to handle specific date formats, or even uses dateutil.parser.parse.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?sort=colname~numeric to sort by by column cast to real 657572753  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);