Exclude unpublished articles from Google search #769

LeMurphant · 2024-09-27T16:21:03Z

Currently, pages that are in-progress are indexed by Google, and we don't want that. See for context
https://discord.com/channels/677546901339504640/1088468403406258196/1287127033503154186

One option is that pages in "Live" or "Unlisted" should be indexed in the robots.txt. I'm not sure how this would interact with caching on google.

Another option is that we simply make these pages not be rendered, with e.g. a 404

Aprillion · 2024-10-02T10:59:42Z

if I understand the black magic, pages listed in robots.txt are still indexed for search, just not visited by the crawler so the only info it gets is from the URL (and from the links from other pages to this page if those other pages are allowed to follow) - since we have the question title in the URL, I don't think it would completely remove those questions from search results by using robots.txt, it would remove snippets from the page

using X-Robots-Tag: noindex HTTP header instead might have higher chance of successfully removing them from search index, and the page should get re-crawled once in a while, so then it goes live and we remove the header, it should appear in search index again sooner or later :blobmaybe: https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag

LeMurphant added this to aisafety.info redesign Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exclude unpublished articles from Google search #769

Exclude unpublished articles from Google search #769

LeMurphant commented Sep 27, 2024

Aprillion commented Oct 2, 2024

Exclude unpublished articles from Google search #769

Exclude unpublished articles from Google search #769

Comments

LeMurphant commented Sep 27, 2024

Aprillion commented Oct 2, 2024