issue_comments
10 rows where issue = 447469253 and user = 9599 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Improvements to table label detection · 10 ✖
| id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | issue | performed_via_github_app |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1264769569 | https://github.com/simonw/datasette/issues/485#issuecomment-1264769569 | https://api.github.com/repos/simonw/datasette/issues/485 | IC_kwDOBm6k_c5LYtoh | simonw 9599 | 2022-10-03T00:04:42Z | 2022-10-03T00:04:42Z | OWNER | I love these tips - tools that can compile a simple machine learning model to a SQL query! Would be pretty cool if I could bundle a model in Datasette itself as a big in-memory SQLite SQL query: |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Improvements to table label detection 447469253 | |
| 1264737290 | https://github.com/simonw/datasette/issues/485#issuecomment-1264737290 | https://api.github.com/repos/simonw/datasette/issues/485 | IC_kwDOBm6k_c5LYlwK | simonw 9599 | 2022-10-02T21:29:59Z | 2022-10-02T21:29:59Z | OWNER | To clarify: the feature this issue is talking about relates to the way Datasette automatically displays foreign key relationships, for example on this page: https://github-to-sqlite.dogsheep.net/github/commits
Each of those columns is a foreign key to another table. The link text that is displayed there comes from the "label column" that has either been configured or automatically detected for that other table. I wonder if this could be handled with a tiny machine learning model that's trained to help pick the best label column? Inputs to that model could include:
Output would be the most likely label column, or some indicator that no likely candidates had been found. My hunch is that this would be better solved using a few extra heuristics rather than by training a model, but it does feel like an interesting opportunity to experiment with a tiny ML model. Asked for tips about this on Twitter: https://twitter.com/simonw/status/1576680930680262658 |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Improvements to table label detection 447469253 | |
| 497116074 | https://github.com/simonw/datasette/issues/485#issuecomment-497116074 | https://api.github.com/repos/simonw/datasette/issues/485 | MDEyOklzc3VlQ29tbWVudDQ5NzExNjA3NA== | simonw 9599 | 2019-05-29T21:29:16Z | 2019-05-29T21:29:16Z | OWNER | Another good rule of thumb: look for text fields with a unique constraint? |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Improvements to table label detection 447469253 | |
| 496367866 | https://github.com/simonw/datasette/issues/485#issuecomment-496367866 | https://api.github.com/repos/simonw/datasette/issues/485 | MDEyOklzc3VlQ29tbWVudDQ5NjM2Nzg2Ng== | simonw 9599 | 2019-05-28T05:14:06Z | 2019-05-28T05:14:06Z | OWNER | I'm going to generate statistics for every TEXT column. Any column with more than 90% distinct rows (compared to the total count of rows) will be a candidate for the label. I will then pick the candidate column with the shortest average length. |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Improvements to table label detection 447469253 | |
| 496283728 | https://github.com/simonw/datasette/issues/485#issuecomment-496283728 | https://api.github.com/repos/simonw/datasette/issues/485 | MDEyOklzc3VlQ29tbWVudDQ5NjI4MzcyOA== | simonw 9599 | 2019-05-27T18:44:07Z | 2019-05-27T18:44:07Z | OWNER | This code now lives in a method on the new |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Improvements to table label detection 447469253 | |
| 496039483 | https://github.com/simonw/datasette/issues/485#issuecomment-496039483 | https://api.github.com/repos/simonw/datasette/issues/485 | MDEyOklzc3VlQ29tbWVudDQ5NjAzOTQ4Mw== | simonw 9599 | 2019-05-26T23:22:53Z | 2019-05-26T23:22:53Z | OWNER | Comparing these two SQL queries (the one with union and the one without) using explain: So I'm going to use the one without the union. |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Improvements to table label detection 447469253 | |
| 496039267 | https://github.com/simonw/datasette/issues/485#issuecomment-496039267 | https://api.github.com/repos/simonw/datasette/issues/485 | MDEyOklzc3VlQ29tbWVudDQ5NjAzOTI2Nw== | simonw 9599 | 2019-05-26T23:19:38Z | 2019-05-26T23:20:10Z | OWNER | Thinking about that union query: I imagine doing this with union could encourage multiple full table scans. Maybe this query would only do one? https://latest.datasette.io/fixtures?sql=select%0D%0A++count+%28distinct+name%29+as+count_distinct_column_1%2C%0D%0A++avg%28length%28name%29%29+as+avg_length_column_1%2C%0D%0A++count%28distinct+address%29+as+count_distinct_column_2%2C%0D%0A++avg%28length%28address%29%29+as+avg_length_column_2%0D%0Afrom+roadside_attractions
|
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Improvements to table label detection 447469253 | |
| 495085021 | https://github.com/simonw/datasette/issues/485#issuecomment-495085021 | https://api.github.com/repos/simonw/datasette/issues/485 | MDEyOklzc3VlQ29tbWVudDQ5NTA4NTAyMQ== | simonw 9599 | 2019-05-23T06:27:57Z | 2019-05-26T23:15:51Z | OWNER | I could attempt to calculate the statistics needed for this in a time limited SQL query something like this one: https://latest.datasette.io/fixtures?sql=select+%27name%27+as+column%2C+count+%28distinct+name%29+as+count_distinct%2C+avg%28length%28name%29%29+as+avg_length+from+roadside_attractions%0D%0A++union%0D%0Aselect+%27address%27+as+column%2C+count%28distinct+address%29+as+count_distinct%2C+avg%28length%28address%29%29+as+avg_length+from+roadside_attractions
|
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Improvements to table label detection 447469253 | |
| 496038601 | https://github.com/simonw/datasette/issues/485#issuecomment-496038601 | https://api.github.com/repos/simonw/datasette/issues/485 | MDEyOklzc3VlQ29tbWVudDQ5NjAzODYwMQ== | simonw 9599 | 2019-05-26T23:08:41Z | 2019-05-26T23:08:41Z | OWNER | The code currently assumes the primary key is called "id" or "pk" - improving it to detect the primary key using database introspection should work much better. |
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Improvements to table label detection 447469253 | |
| 495083670 | https://github.com/simonw/datasette/issues/485#issuecomment-495083670 | https://api.github.com/repos/simonw/datasette/issues/485 | MDEyOklzc3VlQ29tbWVudDQ5NTA4MzY3MA== | simonw 9599 | 2019-05-23T06:21:52Z | 2019-05-23T06:22:36Z | OWNER | If a table has more than two columns we could do a betterl job at guessing the label column. A few potential tricks:
|
{
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
Improvements to table label detection 447469253 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] (
[html_url] TEXT,
[issue_url] TEXT,
[id] INTEGER PRIMARY KEY,
[node_id] TEXT,
[user] INTEGER REFERENCES [users]([id]),
[created_at] TEXT,
[updated_at] TEXT,
[author_association] TEXT,
[body] TEXT,
[reactions] TEXT,
[issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
ON [issue_comments] ([user]);


user 1