Google Sites × Cloudflare: Why robots.txt Was Still Blocking GPTBot

A Real-World Issue with Cloudflare Managed robots.txt

Updated: May 10, 2026


■ Overview

While building a Google Sites + Cloudflare Pages architecture, we encountered a situation where:

robots.txt appeared correct,

but GPTBot and other AI crawlers were still blocked.

Even after manually updating robots.txt, AI crawlers continued to detect:

Disallow: /

The issue was ultimately caused by:

Cloudflare Managed robots.txt

generated through AI Crawl Control.


■ Symptoms

The following behavior occurred:

This created a mismatch between:

Human-visible robots.txt

and

AI crawler-visible robots.txt


■ Root Cause

The issue was caused by:

Cloudflare AI Crawl Control

with:

Managed robots.txt

enabled.

Cloudflare automatically injected additional directives such as:

User-agent: GPTBot

Disallow: /


User-agent: Google-Extended

Disallow: /


User-agent: ClaudeBot

Disallow: /

This happened even though the manually uploaded robots.txt contained:

User-agent: *

Allow: /


■ Why This Is Difficult to Notice

The confusing part is that:

https://www.example.com/robots.txt

appears normal in the browser.

However, Cloudflare internally merges additional AI crawler directives into the response.

As a result:

The browser output and the actual crawler response

can temporarily differ.


■ Solution

The solution was:

# Suggested Custom Path


```txt

/column/google-sites-cloudflare-managed-robots-txt

Google Sites × Cloudflare: Why robots.txt Was Still Blocking GPTBot

A Real-World Issue with Cloudflare Managed robots.txt

Updated: May 10, 2026


■ Overview

While building a Google Sites + Cloudflare Pages architecture, we encountered a situation where:

robots.txt appeared correct,

but GPTBot and other AI crawlers were still blocked.

Even after manually updating robots.txt, AI crawlers continued to detect:

Disallow: /

The issue was ultimately caused by:

Cloudflare Managed robots.txt

generated through AI Crawl Control.


■ Symptoms

The following behavior occurred:

This created a mismatch between:

Human-visible robots.txt

and

AI crawler-visible robots.txt


■ Root Cause

The issue was caused by:

Cloudflare AI Crawl Control

with:

Managed robots.txt

enabled.

Cloudflare automatically injected additional directives such as:

User-agent: GPTBot

Disallow: /


User-agent: Google-Extended

Disallow: /


User-agent: ClaudeBot

Disallow: /

This happened even though the manually uploaded robots.txt contained:

User-agent: *

Allow: /


■ Why This Is Difficult to Notice

The confusing part is that:

https://www.example.com/robots.txt

appears normal in the browser.

However, Cloudflare internally merges additional AI crawler directives into the response.

As a result:

The browser output and the actual crawler response

can temporarily differ.


■ Solution

The solution was:

Cloudflare Dashboard

AI

AI Crawl Control

Managed robots.txt

Disable

After disabling Managed robots.txt, the site immediately began serving the intended file:

User-agent: *

Allow: /


Sitemap: https://www.example.com/sitemap.xml


■ Important Technical Detail

Cloudflare Managed robots.txt does not fully replace your file.

Instead, it:

merges additional directives

into your existing robots.txt response.

This means you may still see your own rules while Cloudflare silently appends AI crawler restrictions underneath.


■ Why This Matters in the AI Search Era

This issue is becoming increasingly important because modern AI systems now actively crawl websites.

Examples include:

As AI-driven search grows, websites increasingly need to consider:

not just traditional SEO.


■ Google Sites + Cloudflare Architecture

Google Sites is lightweight and easy to maintain, but has limitations around:

By combining:

Google Sites

+

Cloudflare Pages

+

Custom Domains

it becomes possible to build:

web architectures without running a traditional server stack.


■ Practical Takeaway

If AI crawlers still detect:

Disallow: /

even after updating robots.txt manually, check whether:

Cloudflare Managed robots.txt

is still enabled.

This setting can override or append crawler directives automatically, especially for AI-related bots.