Sites scramble to block ChatGPT web crawler after instructions emerge

:

“According to OpenAI’s documentation, GPTBot will be identifiable by the user agent token "GPTBot,” with its full string being “Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)”.

The OpenAI docs also give instructions about how to block GPTBot from crawling websites using the industry-standard robots.txt file, which is a text file that sits at the root directory of a website and instructs web crawlers (such as those used by search engines) not to index the site.“