Mobile navigation


Who will train the robots now?

What will be the future of generative AI if publishers start to block the crawlers?

By James Evelegh

Who will train the robots now?

Generative AI is incredible. It’s not infallible, it makes mistakes and its words lack soul, but, hey, still incredible.

And publishers can pat themselves on the back for that, because we are, albeit unwittingly and unremunerated, part of its success.

It is largely our content that has provided the all-important training fodder with which the programmers and developers have used to fine-tune the algorithms, which then use that same content to help generate the answers to user questions.

But with the reports this week that the New York Times (not an insignificant provider of training fodder) has started blocking OpenAI’s web crawler from scraping its content, the foundation stones of generative AI might be starting to wobble.

Hats off to the NYT. Publishers have every right to stop their content being used for someone else’s commercial gain.

It is ethically dubious, to say the least, for organisations to create whole new businesses, based in significant part on the hard work and investment of others, without their permission and without paying.

If this were to be the start of a trend of publishers successfully preventing their content from being scraped, then what would the AI companies use to fuel their generative AI platforms?

Furthermore, the quality of the responses created by ChatGPT et al would start to deteriorate, because the underlying content they use to generate the answers will be of a much lower quality.

Publishers can’t afford to be mugs here. As generative AI starts to be used by search engines to provide complete answers to user questions, as opposed to a list of links, publishers risk losing out in the new search landscape.

The AI tech is out there, it’s exciting and we can use it for our benefit, but a regulatory framework which properly reimburses publishers is urgently needed. Blocking their crawlers might give the AI behemoths the incentive they need to engage properly with publishers.

You can catch James Evelegh’s regular column in the InPubWeekly newsletter, which you can register to receive here.