OpenAI

GPTBotcrawlers from OpenAI

Crawls the web for training ChatGPT models.

GPTBot is OpenAI's crawler used to gather public web pages for training future GPT models. It is separate from ChatGPT-User (which fetches pages on-demand at chat time) and OAI-SearchBot (which indexes for ChatGPT Search). GPTBot obeys robots.txt and identifies itself with a clear user-agent. Allow it if you want your content to influence what ChatGPT 'knows' about your domain; disallow it if you want to opt out of training.

Vendor
OpenAI
Category
Crawlers (training & indexing)
User-Agent
GPTBot
Documentation

robots.txt snippets

Allow
User-agent: GPTBot
Allow: /
Disallow
User-agent: GPTBot
Disallow: /

FAQ

What is GPTBot?
GPTBot is OpenAI's crawler used to gather public web pages for training future GPT models. It is separate from ChatGPT-User (which fetches pages on-demand at chat time) and OAI-SearchBot (which indexes for ChatGPT Search). GPTBot obeys robots.txt and identifies itself with a clear user-agent. Allow it if you want your content to influence what ChatGPT 'knows' about your domain; disallow it if you want to opt out of training.
What is the user-agent string for GPTBot?
GPTBot identifies itself with the user-agent token "GPTBot". You can match it in robots.txt with "User-Agent: GPTBot" and route nginx / log-analyzer rules against that token.
How do I allow GPTBot in robots.txt?
Add the following block to your /robots.txt — this explicitly grants GPTBot access: User-agent: GPTBot Allow: /
How do I block GPTBot in robots.txt?
Add the following block to your /robots.txt — note that well-behaved bots honor this, but not every crawler does: User-agent: GPTBot Disallow: /
How can I check whether my site is ready for GPTBot?
Run a free check at https://agentics.page — it audits whether your robots.txt allows the right bots, whether you publish llms.txt and JSON-LD structured data, whether your content is server-rendered, and whether GPTBot can actually consume your site.

Is your domain ready for GPTBot?

agentics checks whether your robots.txt allows the right bots, your llms.txt is in shape, your JSON-LD and SSR content are visible, and whether GPTBot can actually use your domain.

Run free check →

Related agents