InPublishing: Guardian blocks OpenAI scraper

As reported on the Guardian’s website, the Guardian has confirmed that it has prevented OpenAI from deploying software that harvests its content.

OpenAI announced last month that it will enable website operators to block its web crawler from accessing their content.

A Guardian News & Media spokesperson said: “The scraping of intellectual property from the Guardian’s website for commercial purposes is, and has always been, contrary to our terms of service. The Guardian’s commercial licensing team has many mutually beneficial commercial relationships with developers around the world, and looks forward to building further such relationships in the future.”

According to Originality.ai, which detects AI-generated content, news websites now blocking the GPTBot crawler, which takes data from webpages to feed into its AI models, include CNN, Reuters, the Washington Post, Bloomberg, the New York Times and its sports site the Athletic. Other sites that have blocked GPTBot include Lonely Planet, Amazon, the job listings site Indeed, the question-and-answer site Quora, and dictionary.com, reports the Guardian.

Keep up-to-date with publishing news: sign up here for InPubWeekly, our free weekly e-newsletter.

Related articles

Receive InPublishing magazine