home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 964322136

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app reactions draft state_reason
964322136 MDU6SXNzdWU5NjQzMjIxMzY= 1426 Manage /robots.txt in Datasette core, block robots by default 9599 open 0     9 2021-08-09T19:56:56Z 2021-12-04T07:11:29Z   OWNER  

See accompanying Twitter thread: https://twitter.com/simonw/status/1424820203603431439

Datasette currently has a plugin for configuring robots.txt, but I'm beginning to think it should be part of core and crawlers should be blocked by default - having people explicitly opt-in to having their sites crawled and indexed feels a lot safer https://datasette.io/plugins/datasette-block-robots

I have a lot of Datasettes deployed now, and tailing logs shows that they are being hammered by search engine crawlers even though many of them are not interesting enough to warrant indexing.

I'm starting to think blocking crawlers would actually be a better default for most people, provided it was well documented and easy to understand how to allow them.

Default-deny is usually a better policy than default-allow!

107914493 issue    
{
    "url": "https://api.github.com/repos/simonw/datasette/issues/1426/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
   

Links from other tables

  • 3 rows from issues_id in issues_labels
  • 9 rows from issue in issue_comments
Powered by Datasette · Queries took 0.613ms · About: github-to-sqlite