html_url,issue_url,id,node_id,user,user_label,created_at,updated_at,author_association,body,reactions,issue,issue_label,performed_via_github_app
https://github.com/simonw/datasette/issues/485#issuecomment-497116074,https://api.github.com/repos/simonw/datasette/issues/485,497116074,MDEyOklzc3VlQ29tbWVudDQ5NzExNjA3NA==,9599,simonw,2019-05-29T21:29:16Z,2019-05-29T21:29:16Z,OWNER,Another good rule of thumb: look for text fields with a unique constraint?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",447469253,Improvements to table label detection ,
https://github.com/simonw/datasette/issues/485#issuecomment-496367866,https://api.github.com/repos/simonw/datasette/issues/485,496367866,MDEyOklzc3VlQ29tbWVudDQ5NjM2Nzg2Ng==,9599,simonw,2019-05-28T05:14:06Z,2019-05-28T05:14:06Z,OWNER,"I'm going to generate statistics for every TEXT column.

Any column with more than 90% distinct rows (compared to the total count of rows) will be a candidate for the label.

I will then pick the candidate column with the shortest average length.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",447469253,Improvements to table label detection ,
https://github.com/simonw/datasette/issues/485#issuecomment-496283728,https://api.github.com/repos/simonw/datasette/issues/485,496283728,MDEyOklzc3VlQ29tbWVudDQ5NjI4MzcyOA==,9599,simonw,2019-05-27T18:44:07Z,2019-05-27T18:44:07Z,OWNER,"This code now lives in a method on the new `datasette.database.Database` class, which should make it easier to write unit tests for.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",447469253,Improvements to table label detection ,
https://github.com/simonw/datasette/issues/485#issuecomment-496039483,https://api.github.com/repos/simonw/datasette/issues/485,496039483,MDEyOklzc3VlQ29tbWVudDQ5NjAzOTQ4Mw==,9599,simonw,2019-05-26T23:22:53Z,2019-05-26T23:22:53Z,OWNER,"Comparing these two SQL queries (the one with union and the one without) using explain:

With union: https://latest.datasette.io/fixtures?sql=explain+select+%27name%27+as+column%2C+count+%28distinct+name%29+as+count_distinct%2C+avg%28length%28name%29%29+as+avg_length+from+roadside_attractions%0D%0A++union%0D%0Aselect+%27address%27+as+column%2C+count%28distinct+address%29+as+count_distinct%2C+avg%28length%28address%29%29+as+avg_length+from+roadside_attractions produces 52 rows

Without union: https://latest.datasette.io/fixtures?sql=explain+select%0D%0A++count+(distinct+name)+as+count_distinct_column_1%2C%0D%0A++avg(length(name))+as+avg_length_column_1%2C%0D%0A++count(distinct+address)+as+count_distinct_column_2%2C%0D%0A++avg(length(address))+as+avg_length_column_2%0D%0Afrom+roadside_attractions produces 32 rows

So I'm going to use the one without the union.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",447469253,Improvements to table label detection ,
https://github.com/simonw/datasette/issues/485#issuecomment-496039267,https://api.github.com/repos/simonw/datasette/issues/485,496039267,MDEyOklzc3VlQ29tbWVudDQ5NjAzOTI2Nw==,9599,simonw,2019-05-26T23:19:38Z,2019-05-26T23:20:10Z,OWNER,"Thinking about that union query: I imagine doing this with union could encourage multiple full table scans. Maybe this query would only do one? https://latest.datasette.io/fixtures?sql=select%0D%0A++count+%28distinct+name%29+as+count_distinct_column_1%2C%0D%0A++avg%28length%28name%29%29+as+avg_length_column_1%2C%0D%0A++count%28distinct+address%29+as+count_distinct_column_2%2C%0D%0A++avg%28length%28address%29%29+as+avg_length_column_2%0D%0Afrom+roadside_attractions

```
select
  count (distinct name) as count_distinct_column_1,
  avg(length(name)) as avg_length_column_1,
  count(distinct address) as count_distinct_column_2,
  avg(length(address)) as avg_length_column_2
from roadside_attractions
```

<img width=""800"" alt=""fixtures__select_count__distinct_name__as_count_distinct_column_1__avg_length_name___as_avg_length_column_1__count_distinct_address__as_count_distinct_column_2__avg_length_address___as_avg_length_column_2_from_roadside_attractions"" src=""https://user-images.githubusercontent.com/9599/58388316-201ad580-7fd2-11e9-95c3-c98e2758fc1e.png"">
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",447469253,Improvements to table label detection ,
https://github.com/simonw/datasette/issues/485#issuecomment-495085021,https://api.github.com/repos/simonw/datasette/issues/485,495085021,MDEyOklzc3VlQ29tbWVudDQ5NTA4NTAyMQ==,9599,simonw,2019-05-23T06:27:57Z,2019-05-26T23:15:51Z,OWNER,"I could attempt to calculate the statistics needed for this in a time limited SQL query something like this one: https://latest.datasette.io/fixtures?sql=select+%27name%27+as+column%2C+count+%28distinct+name%29+as+count_distinct%2C+avg%28length%28name%29%29+as+avg_length+from+roadside_attractions%0D%0A++union%0D%0Aselect+%27address%27+as+column%2C+count%28distinct+address%29+as+count_distinct%2C+avg%28length%28address%29%29+as+avg_length+from+roadside_attractions

```
select 'name' as column, count (distinct name) as count_distinct, avg(length(name)) as avg_length from roadside_attractions
  union
select 'address' as column, count(distinct address) as count_distinct, avg(length(address)) as avg_length from roadside_attractions
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",447469253,Improvements to table label detection ,
https://github.com/simonw/datasette/issues/485#issuecomment-496038601,https://api.github.com/repos/simonw/datasette/issues/485,496038601,MDEyOklzc3VlQ29tbWVudDQ5NjAzODYwMQ==,9599,simonw,2019-05-26T23:08:41Z,2019-05-26T23:08:41Z,OWNER,"The code currently assumes the primary key is called ""id"" or ""pk"" - improving it to detect the primary key using database introspection should work much better.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",447469253,Improvements to table label detection ,
https://github.com/simonw/datasette/issues/485#issuecomment-495083670,https://api.github.com/repos/simonw/datasette/issues/485,495083670,MDEyOklzc3VlQ29tbWVudDQ5NTA4MzY3MA==,9599,simonw,2019-05-23T06:21:52Z,2019-05-23T06:22:36Z,OWNER,"If a table has more than two columns we could do a betterl job at guessing the label column. A few potential tricks:


* look for a column called name or title
* look for the first column of type text
* check for the text column with the most diversity in values","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",447469253,Improvements to table label detection ,