Google Sites × Cloudflare: Why robots.txt Was Still Blocking GPTBot

A Real-World Issue with Cloudflare Managed robots.txt

Updated: May 10, 2026

■ Overview

While building a Google Sites + Cloudflare Pages architecture, we encountered a situation where:

robots.txt appeared correct,

but GPTBot and other AI crawlers were still blocked.

Even after manually updating robots.txt, AI crawlers continued to detect:

Disallow: /

The issue was ultimately caused by:

Cloudflare Managed robots.txt

generated through AI Crawl Control.

■ Symptoms

The following behavior occurred:

robots.txt was manually updated
Browser displayed Allow rules correctly
sitemap.xml worked normally
Cloudflare Pages deployment was healthy
Google Sites worked correctly
However GPTBot was still blocked
Cloudflare Radar showed Disallow rules
AI crawlers still detected restrictions

This created a mismatch between:

Human-visible robots.txt

and

AI crawler-visible robots.txt

■ Root Cause

The issue was caused by:

Cloudflare AI Crawl Control

with:

Managed robots.txt

enabled.

Cloudflare automatically injected additional directives such as:

User-agent: GPTBot

Disallow: /

User-agent: Google-Extended

Disallow: /

User-agent: ClaudeBot

Disallow: /

This happened even though the manually uploaded robots.txt contained:

User-agent: *

Allow: /

■ Why This Is Difficult to Notice

The confusing part is that:

https://www.example.com/robots.txt

appears normal in the browser.

However, Cloudflare internally merges additional AI crawler directives into the response.

As a result:

The browser output and the actual crawler response

can temporarily differ.

■ Solution

The solution was:

# Suggested Custom Path

```txt

/column/google-sites-cloudflare-managed-robots-txt

Google Sites × Cloudflare: Why robots.txt Was Still Blocking GPTBot

A Real-World Issue with Cloudflare Managed robots.txt

Updated: May 10, 2026

■ Overview

While building a Google Sites + Cloudflare Pages architecture, we encountered a situation where:

robots.txt appeared correct,

but GPTBot and other AI crawlers were still blocked.

Even after manually updating robots.txt, AI crawlers continued to detect:

Disallow: /

The issue was ultimately caused by:

Cloudflare Managed robots.txt

generated through AI Crawl Control.

■ Symptoms

The following behavior occurred:

robots.txt was manually updated
Browser displayed Allow rules correctly
sitemap.xml worked normally
Cloudflare Pages deployment was healthy
Google Sites worked correctly
However GPTBot was still blocked
Cloudflare Radar showed Disallow rules
AI crawlers still detected restrictions

This created a mismatch between:

Human-visible robots.txt

and

AI crawler-visible robots.txt

■ Root Cause

The issue was caused by:

Cloudflare AI Crawl Control

with:

Managed robots.txt

enabled.

Cloudflare automatically injected additional directives such as:

User-agent: GPTBot

Disallow: /

User-agent: Google-Extended

Disallow: /

User-agent: ClaudeBot

Disallow: /

This happened even though the manually uploaded robots.txt contained:

User-agent: *

Allow: /

■ Why This Is Difficult to Notice

The confusing part is that:

https://www.example.com/robots.txt

appears normal in the browser.

However, Cloudflare internally merges additional AI crawler directives into the response.

As a result:

The browser output and the actual crawler response

can temporarily differ.

■ Solution

The solution was:

Cloudflare Dashboard

↓

AI Crawl Control

↓

Managed robots.txt

↓

Disable

After disabling Managed robots.txt, the site immediately began serving the intended file:

User-agent: *

Allow: /

Sitemap: https://www.example.com/sitemap.xml

■ Important Technical Detail

Cloudflare Managed robots.txt does not fully replace your file.

Instead, it:

merges additional directives

into your existing robots.txt response.

This means you may still see your own rules while Cloudflare silently appends AI crawler restrictions underneath.

■ Why This Matters in the AI Search Era

This issue is becoming increasingly important because modern AI systems now actively crawl websites.

Examples include:

GPTBot
ChatGPT-User
Google-Extended
ClaudeBot
Perplexity crawlers

As AI-driven search grows, websites increasingly need to consider:

robots.txt
sitemap.xml
AI crawler permissions
structured content
internal linking
canonical management

not just traditional SEO.

■ Google Sites + Cloudflare Architecture

Google Sites is lightweight and easy to maintain, but has limitations around:

robots.txt customization
sitemap management
redirects
canonical control
advanced SEO configuration

By combining:

Google Sites

Cloudflare Pages

Custom Domains

it becomes possible to build:

low-cost
lightweight
AI-accessible
SEO-optimized

web architectures without running a traditional server stack.

■ Practical Takeaway

If AI crawlers still detect:

Disallow: /

even after updating robots.txt manually, check whether:

Cloudflare Managed robots.txt

is still enabled.

This setting can override or append crawler directives automatically, especially for AI-related bots.