Sites scramble to block ChatGPT web crawler after instructions emerge
Sites scramble to block ChatGPT web crawler after instructions emerge
“According to OpenAI’s documentation, GPTBot will be identifiable by the user agent token "GPTBot,” with its full string being “Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)”.
The OpenAI docs also give instructions about how to block GPTBot from crawling websites using the industry-standard robots.txt file, which is a text file that sits at the root directory of a website and instructs web crawlers (such as those used by search engines) not to index the site.“