Google Sites × Cloudflare: Why robots.txt Was Still Blocking GPTBot
A Real-World Issue with Cloudflare Managed robots.txt
Updated: May 10, 2026
■ Overview
While building a Google Sites + Cloudflare Pages architecture, we encountered a situation where:
robots.txt appeared correct,
but GPTBot and other AI crawlers were still blocked.
Even after manually updating robots.txt, AI crawlers continued to detect:
Disallow: /
The issue was ultimately caused by:
Cloudflare Managed robots.txt
generated through AI Crawl Control.
■ Symptoms
The following behavior occurred:
robots.txt was manually updated
Browser displayed Allow rules correctly
sitemap.xml worked normally
Cloudflare Pages deployment was healthy
Google Sites worked correctly
However GPTBot was still blocked
Cloudflare Radar showed Disallow rules
AI crawlers still detected restrictions
This created a mismatch between:
Human-visible robots.txt
and
AI crawler-visible robots.txt
■ Root Cause
The issue was caused by:
Cloudflare AI Crawl Control
with:
Managed robots.txt
enabled.
Cloudflare automatically injected additional directives such as:
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: ClaudeBot
Disallow: /
This happened even though the manually uploaded robots.txt contained:
User-agent: *
Allow: /
■ Why This Is Difficult to Notice
The confusing part is that:
https://www.example.com/robots.txt
appears normal in the browser.
However, Cloudflare internally merges additional AI crawler directives into the response.
As a result:
The browser output and the actual crawler response
can temporarily differ.
■ Solution
The solution was:
# Suggested Custom Path
```txt
/column/google-sites-cloudflare-managed-robots-txt
Google Sites × Cloudflare: Why robots.txt Was Still Blocking GPTBot
A Real-World Issue with Cloudflare Managed robots.txt
Updated: May 10, 2026
■ Overview
While building a Google Sites + Cloudflare Pages architecture, we encountered a situation where:
robots.txt appeared correct,
but GPTBot and other AI crawlers were still blocked.
Even after manually updating robots.txt, AI crawlers continued to detect:
Disallow: /
The issue was ultimately caused by:
Cloudflare Managed robots.txt
generated through AI Crawl Control.
■ Symptoms
The following behavior occurred:
robots.txt was manually updated
Browser displayed Allow rules correctly
sitemap.xml worked normally
Cloudflare Pages deployment was healthy
Google Sites worked correctly
However GPTBot was still blocked
Cloudflare Radar showed Disallow rules
AI crawlers still detected restrictions
This created a mismatch between:
Human-visible robots.txt
and
AI crawler-visible robots.txt
■ Root Cause
The issue was caused by:
Cloudflare AI Crawl Control
with:
Managed robots.txt
enabled.
Cloudflare automatically injected additional directives such as:
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: ClaudeBot
Disallow: /
This happened even though the manually uploaded robots.txt contained:
User-agent: *
Allow: /
■ Why This Is Difficult to Notice
The confusing part is that:
https://www.example.com/robots.txt
appears normal in the browser.
However, Cloudflare internally merges additional AI crawler directives into the response.
As a result:
The browser output and the actual crawler response
can temporarily differ.
■ Solution
The solution was:
Cloudflare Dashboard
↓
AI
↓
AI Crawl Control
↓
Managed robots.txt
↓
Disable
After disabling Managed robots.txt, the site immediately began serving the intended file:
User-agent: *
Allow: /
Sitemap: https://www.example.com/sitemap.xml
■ Important Technical Detail
Cloudflare Managed robots.txt does not fully replace your file.
Instead, it:
merges additional directives
into your existing robots.txt response.
This means you may still see your own rules while Cloudflare silently appends AI crawler restrictions underneath.
■ Why This Matters in the AI Search Era
This issue is becoming increasingly important because modern AI systems now actively crawl websites.
Examples include:
GPTBot
ChatGPT-User
Google-Extended
ClaudeBot
Perplexity crawlers
As AI-driven search grows, websites increasingly need to consider:
robots.txt
sitemap.xml
AI crawler permissions
structured content
internal linking
canonical management
not just traditional SEO.
■ Google Sites + Cloudflare Architecture
Google Sites is lightweight and easy to maintain, but has limitations around:
robots.txt customization
sitemap management
redirects
canonical control
advanced SEO configuration
By combining:
Google Sites
+
Cloudflare Pages
+
Custom Domains
it becomes possible to build:
low-cost
lightweight
AI-accessible
SEO-optimized
web architectures without running a traditional server stack.
■ Practical Takeaway
If AI crawlers still detect:
Disallow: /
even after updating robots.txt manually, check whether:
Cloudflare Managed robots.txt
is still enabled.
This setting can override or append crawler directives automatically, especially for AI-related bots.