GEO Guide

You Blocked ChatGPT and Didn't Even Know It

Your website may be refusing AI crawlers right now. 5.89% of all sites block GPTBot. Cloudflare blocks AI by default since July 2025. Here's how to check and fix it in 2 minutes.

AI Search Visibility Editorial TeamApril 8, 20265 min read

Right now, your website may be invisible to ChatGPT, Perplexity, and Claude — not because your content is bad, but because your server is refusing to let them in. Open a new tab. Type yoursite.com/robots.txt. If you see GPTBot, ClaudeBot, or PerplexityBot next to a Disallow line, you found the problem.

Check Right Now (10 Seconds)

Open https://yoursite.com/robots.txt in a new tab. This file tells every crawler — search engines and AI bots alike — what they're allowed to access. It takes 10 seconds to read and it controls 100% of your AI visibility.

Look for these user agents next to "Disallow" lines:

GPTBotClaudeBotPerplexityBotOAI-SearchBotChatGPT-User

Also check these three places — robots.txt isn't always the culprit:

  • Cloudflare dashboard: Security > Bots > "AI Bots" toggle. Since July 2025, new domains block AI by default.
  • WordPress Settings: Settings > Reading > "Discourage search engines" checkbox. This adds noindex and can trigger broad bot blocks.
  • Security plugins: Wordfence, Sucuri, and similar plugins maintain bot blocklists that may include AI crawlers.

The AI Crawler Field Guide

Not all AI bots are equal. The critical distinction: training bots vs. search/citation bots. Block training if you want your content excluded from model training. Never block search bots if you want AI visibility. In the table below, red rows are search/citation bots you should never block. Gray rows are training bots that are safe to block.

User AgentCompanyPurposeWhat Blocking Means
GPTBotOpenAIModel trainingNo training (fine to block)
OAI-SearchBotOpenAIChatGPT searchCan’t cite you in ChatGPT search
ChatGPT-UserOpenAIUser browsingCan’t browse your page in ChatGPT
ClaudeBotAnthropicChat citationClaude can’t cite you
anthropic-aiAnthropicBulk trainingNo training (fine to block)
PerplexityBotPerplexitySearch indexInvisible in Perplexity search
Google-ExtendedGoogleGemini trainingNo Gemini training (doesn’t affect AI Overviews)
GooglebotGoogleSearch + AIOInvisible in Google entirely

Key rule: Block GPTBot, anthropic-ai, and Google-Extended if you want to opt out of training. Keep OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, and Googlebot allowed. Always.

5 Ways You Accidentally Blocked AI

Most sites don't intentionally block AI crawlers. These five causes account for the majority of accidental blocks we see in audits.

1

Cloudflare's Default Flip (July 2025)

Cloudflare enabled "Block AI Bots" by default for all new domains. With ~20% of the web behind Cloudflare, millions of sites started blocking AI crawlers overnight without any action from site owners.

Exact path: Cloudflare Dashboard > Security > Bots > "Block AI scrapers and crawlers". If this toggle is on, every AI bot gets a 403 before it even sees your robots.txt. Existing domains may have been auto-opted-in during Cloudflare plan renewals.

2

WordPress Security Plugins

Wordfence: Firewall > Blocking > "Advanced Blocking" — check the User Agent Pattern field for GPTBot, ClaudeBot, or wildcard patterns like *bot*. Sucuri: WAF > Settings > "Blocked User Agents" list. iThemes Security: Security > Bots > "Banned User Agents."

Plugin updates silently add new AI user agents to block lists. After every update, verify your bot allowlist. These plugins also stack: Wordfence can block a bot that Cloudflare already allowed through.

3

Server-Level Firewall Rules

WAF rules that require JavaScript execution or CAPTCHAs silently reject AI crawlers. Unlike browser-based bots, AI crawlers cannot solve CAPTCHAs and cannot execute JavaScript challenges (Cloudflare Turnstile, hCaptcha, reCAPTCHA). They receive a 403 or a challenge page, fail the check, and move on.

This is invisible to you. The bot never reaches your server, so your analytics show nothing. Check your WAF's "challenge" or "managed challenge" rules — if they apply to all traffic (not just suspicious IPs), AI crawlers are getting blocked on every request.

4

Staging robots.txt Leftover

The classic two-line disaster that blocks every crawler — Googlebot, GPTBot, all of them:

User-agent: *
Disallow: /

It's the standard staging robots.txt, designed to prevent staging from being indexed. But it ships to production more often than anyone admits — through CI/CD pipelines that copy the wrong file, environment-variable misconfigs, or merge conflicts that default to the restrictive version. Always verify robots.txt after every deployment.

5

CDN/Hosting Rate Limiting

AI crawlers make burst requests — they don't browse page by page like humans. A search bot indexing your site may hit 50–100 pages in a few seconds. If your rate limit is set to 30 requests/minute per IP, the bot gets 429 Too Many Requests after the first burst.

After repeated 429s, crawlers deprioritize your domain and reduce crawl frequency — sometimes permanently. Check your hosting provider's rate limiting settings (Vercel, Netlify, AWS CloudFront, and Nginx all have different defaults). Whitelist known AI bot IP ranges, or raise your burst threshold to at least 120 requests/minute.

What Blocking Costs You

AI search is no longer a niche channel. ChatGPT alone surpassed 900 million weekly active users in early 2026 — that's more weekly users than X (Twitter) and LinkedIn combined. Blocking search bots means zero presence in platforms that collectively serve billions of queries per week.

900M+

ChatGPT weekly active users

100M+

Perplexity monthly queries

5.89%

of all websites block GPTBot

To make this tangible: if your site gets cited in a ChatGPT search answer that's shown to even 0.001% of those weekly users, that's 9,000 potential visitors — from a single query. Block the bot, and that number is permanently zero. Every day your site blocks AI crawlers is a day competitors accumulate citations you're not eligible for.

The Perplexity Stealth Crawler Controversy (August 2025)

In August 2025, Cloudflare publicly documented that Perplexity was using stealth crawlers — user agents that didn't identify as PerplexityBot — to bypass robots.txt blocks. The crawlers impersonated regular browser user agents while systematically scraping content for Perplexity's search index.

Cloudflare responded by fingerprinting the stealth bots and offering blocking tools, and Perplexity faced significant backlash from publishers. The incident underscores a practical reality: robots.txt is a voluntary standard. Well-behaved bots respect it, but there's no technical enforcement.

The takeaway isn't to give up on robots.txt. It's to be strategic: allow the search bots you want to cite you, block the training bots you don't, and accept that controlling all AI access is not realistic. The sites that win are the ones that make themselves easy to cite, not the ones that try to hide.

Not sure what's blocking you? Our audit checks AI crawler accessibility as the first of 7 branches. Indexability testing identifies robots.txt blocks, Cloudflare settings, firewall issues, and rate limiting — in 60 seconds.

Run your first audit free

First 5 audits free. No credit card required.

Frequently Asked Questions

No. AI crawlers (GPTBot, ClaudeBot, PerplexityBot) are completely separate from Googlebot. Allowing or blocking them has zero effect on your Google rankings. Google uses Googlebot for search indexing and Google-Extended only for Gemini training.

Your choice. Blocking GPTBot prevents your content from being used in OpenAI model training, but it does not affect ChatGPT search citations. ChatGPT search uses OAI-SearchBot, which is a separate user agent. Block training bots if you want; just keep search bots allowed.

No. Google AI Overviews use Googlebot, not Google-Extended. Cloudflare’s AI bot toggle only affects non-Google AI crawlers. Your AI Overview eligibility is determined by standard Googlebot access and content quality signals.

Check your server access logs for 403 or 429 responses to AI user agents (GPTBot, ClaudeBot, PerplexityBot). If you don’t have log access, run a free audit — our indexability branch checks crawler accessibility as its first test.

Yes. Use path-specific rules in robots.txt (e.g., Disallow: /private/ for a specific bot) or X-Robots-Tag HTTP headers for per-page control. This lets you protect sensitive content while keeping public pages citable.

Author: AI Search Visibility Editorial Team — AI search visibility researchers and GEO practitioners.

Last reviewed: April 8, 2026

Crawler data sourced from OpenAI, Anthropic, Google, and Perplexity official documentation. Cloudflare bot management statistics from Cloudflare Radar reports. All data verified as of April 8, 2026.