Published 2026-05-18 Verified Every 6 weeks Sources 11 named Authored by SquareRank Team

Cluster 1G · AI Crawler & Bot Management

Squarespace's AI Crawler Panel Controls 26 Bots With One Checkbox

A fresh Squarespace site allows AI bots by default¹ — but the moment an owner toggles the "Block known artificial intelligence crawlers" box, 26 named crawlers are disallowed at once¹. There is no per-bot control in the UI, no granularity between training and retrieval, and several of the bots that decide live citations are not on the Squarespace list at all.

This cluster is the canonical reference for that panel. It names every bot Squarespace's checkbox controls, separates the training crawlers from the retrieval crawlers, walks through Settings → Crawlers step by step, and points to the four leaf articles below for the deep dives. If you have not opened that panel in twelve months, the diagnostic at the bottom will tell you, in under five minutes, whether your site is currently visible to AI search.

§ 01 The default state

What Squarespace actually does with AI crawlers by default

A new Squarespace site ships with the 'Block known artificial intelligence crawlers' checkbox unchecked. That means AI bots are allowed to crawl by default, and Squarespace's own help docs describe this as the discoverability-friendly setting. The fear-trigger problem is not the default; it is the 2024-era advice that told owners to toggle the box on without explaining the consequences.

Squarespace's help article on AI exclusion states the default unambiguously: "We default to having the box unchecked (which means we haven't added any 'AI do not crawl' requests to your robots.txt file)."¹ The companion article on optimising for AI-powered search engines repeats the recommendation from the other direction, telling owners to "uncheck the box next to Block known artificial intelligence crawlers" to make sure AI engines can scan the site².

That sentence is the start of the actual problem. Between 2023 and 2025, a wave of designer-blog and forum advice told Squarespace owners to "protect" their content from AI by checking the box. Some owners did, then forgot. Some inherited sites where a previous designer flipped it. And a lot of owners assume the box is checked when it is not, or that it is unchecked when it is. The state of that one checkbox decides whether a quarter of the AI bot universe can read your site.

What the panel actually controls

AI crawlers disallowed at once when the Squarespace AI block is enabled.

Squarespace Help · 2026

Off

default state of the 'Block known artificial intelligence crawlers' checkbox.

Squarespace Help · 2026

per-bot toggles in the Squarespace Crawlers panel UI.

Squarespace Help · 2026

The all-or-nothing design is the second half of the problem. There is no way, inside the panel, to allow Perplexity-User but block GPTBot, or to allow Claude-SearchBot but block ClaudeBot's training crawler⁴. Squarespace gives you one switch over a list of 26, and that list does not match the list of bots that decide whether ChatGPT, Perplexity, or Claude will cite you in their next answer.

§ 02 The list

The 26 bots Squarespace's AI checkbox actually disallows

The Squarespace help center publishes the exact list. There are 26 named user-agents, grouped by purpose: training crawlers from OpenAI, Anthropic, Google, Meta, Apple, ByteDance, and Cohere; bulk web-corpus crawlers like CCBot; and a handful of smaller AI training agents from AI2, You.com, Quora, and others. The list is not the same as the list of bots that decide AI citations.

Here is the panel-controlled list, verbatim from Squarespace's own help center¹: AI2Bot, Ai2Bot-Dolma, aiHitBot, Amazonbot, anthropic-ai, Applebot-Extended, Bytespider, CCBot, ClaudeBot, cohere-ai, cohere-training-data-crawler, DuckAssistBot, FacebookBot, Google-Extended, GoogleOther, GoogleOther-Image, GoogleOther-Video, GPTBot, img2dataset, Meta-ExternalAgent, MyCentralAIScraperBot, omgili, omgilibot, Quora-Bot, TikTokSpider, and YouBot.

Three things to notice. First, almost every bot on the list is a training crawler. GPTBot is OpenAI's training collector; ClaudeBot is Anthropic's training collector⁴; Google-Extended is the robots.txt token that governs Gemini training and Vertex AI grounding⁷; Meta-ExternalAgent is Meta's Llama training crawler; Applebot-Extended is the opt-out signal for Apple Intelligence training⁶. Toggling the box on opts you out of training, not out of being cited.

Second, the retrieval bots that pull a page live mid-conversation are mostly absent. ChatGPT-User and OAI-SearchBot, the two bots that drive ChatGPT and ChatGPT Search citations, are not on the list. Claude-User and Claude-SearchBot, the two retrieval agents Anthropic documents⁴, are not on the list. PerplexityBot and Perplexity-User are not on the list⁵. MistralAI-User is not on the list⁹.

Third, the list mixes bots that consistently respect robots.txt (CCBot⁸, GPTBot, ClaudeBot) with ones that do not. Cloudflare published a detailed investigation in August 2025 showing that Perplexity rotates through undeclared crawlers when its declared ones are blocked¹⁰. ByteDance's Bytespider is widely documented as ignoring robots.txt in practice. The Squarespace checkbox is a polite request, not an enforcement mechanism, and the bots that already disregard polite requests are not going to start because of it.

§ 03 The distinction

Training crawlers and retrieval crawlers do different jobs

A training crawler scrapes your site to feed the next version of an AI model; a retrieval crawler fetches your page in real time when a user asks the AI a question. Block a training crawler and you protect future scraping; block a retrieval crawler and you remove yourself from live citations. The Squarespace AI checkbox does not separate the two.

OpenAI documents three crawlers, and the documentation states the split plainly. GPTBot collects data to train future models. ChatGPT-User visits a page when a user asks ChatGPT a question that requires it. OAI-SearchBot indexes content to surface inside ChatGPT Search results³. The three are independent in robots.txt: an owner can allow OAI-SearchBot and ChatGPT-User while disallowing GPTBot, and that is the configuration most AI-visibility playbooks recommend for a small-business site.

Anthropic documents the same shape: ClaudeBot trains the model, Claude-User fetches pages on behalf of a user, Claude-SearchBot indexes for search⁴. Perplexity documents two: PerplexityBot for the search index, Perplexity-User for live answers, with Perplexity-User noted to "generally ignore robots.txt rules" because a user initiated the request⁵. Apple separates Applebot (search) from Applebot-Extended (training opt-out only), and is explicit that Applebot-Extended does not actually crawl⁶.

Squarespace's checkbox blocks at the training layer for some engines and at the search layer for others. Google-Extended is a training-only token⁷, so blocking it has zero effect on AI Overviews citations and zero effect on Google Search rankings; Google's documentation is explicit on that point. Applebot-Extended is a training opt-out signal that does not affect Applebot itself⁶. GPTBot is training-only, so blocking it has no effect on whether ChatGPT cites you live; it only affects future model versions. The Squarespace UI does not surface any of this nuance.

§ 04 The walkthrough

Settings > Crawlers, step by step

The Squarespace path is short: Settings, then Crawlers. Two checkboxes appear. The top one controls search-engine crawling and should stay on for anyone who wants to be indexed by Google or Bing. The bottom one controls the 26-bot AI list. The fix takes about ninety seconds, but the consequences depend on which mode you intend the site to live in.

Open the Squarespace dashboard and click Settings in the left sidebar. Click Crawlers. The Squarespace help article instructs the same path¹: "Check the box next to Block known artificial intelligence crawlers" to enable the block, or leave it unchecked to allow.

To confirm the state is actually written to your live robots.txt, open yoursite.com/robots.txt in a private browser window. If the AI block is on, the file will list Disallow: / rules under each of the 26 user-agents. If the AI block is off, no AI-specific Disallow rules will appear.

robots.txt What appears in your robots.txt when the AI block is enabled (excerpt)

 # Squarespace adds blocks like this for each of the 26 named user-agents User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: CCBot Disallow: / # ...and so on for the remaining 22 bots

One subtlety the panel hides: the AI checkbox does nothing about robots-txt rules for the retrieval bots Squarespace did not list. If you want, say, MistralAI-User⁹ blocked, or Perplexity-User allowed, the panel will not get you there. Squarespace does not expose direct robots.txt editing on any plan, and the workarounds (X-Robots-Tag headers, page-level noindex, the SEO settings panel) have limits. The robots-txt-custom leaf documents what is and is not possible.

§ 05 The limits

What you cannot do in Squarespace's robots.txt

Squarespace generates robots.txt automatically and does not provide a way to edit it directly. The Crawlers panel updates the file behind the scenes for the 26-bot AI list and the search-engine toggle, but you cannot add a single per-bot Disallow rule, you cannot allow one path and block another, and you cannot append a Sitemap directive that differs from the default. Squarespace's forum has multiple threads on this; the answer is consistently the same.

The platform's design choice trades flexibility for safety. Owners cannot break their robots.txt because they cannot touch it. The cost: per-bot AI control requires going around robots.txt entirely. The three viable workarounds, in order of accessibility:

Page-level noindex via the SEO tab. Available on every plan. Hides a single page from search results, including AI search if the engine respects the directive.
Site-wide meta robots via Code Injection (Business plan and above). Lets you set max-snippet, max-image-preview, and max-video-preview values that govern how AI engines summarise your pages.
X-Robots-Tag headers via Squarespace Developer Mode (Advanced plans, code-comfortable owners only). The most granular option, but the steepest setup curve.

None of those three gives you a true per-user-agent allow/deny matrix. The cleanest setup for a Squarespace owner who wants AI citations is the simple one: leave the AI checkbox unchecked, leave the search-engine checkbox checked, ship a strong content layer, and accept that the bots not on Squarespace's list are governed by their own honour-system robots.txt parsing. Perplexity's August 2025 documented behaviour¹⁰ is a reminder that even a perfect robots.txt does not stop a bot that has decided to misbehave.

§ 06 The diagnostic

A five-minute self-check on your current state

Five files and panels tell you exactly what your site is doing right now. Check robots.txt for the AI Disallow block, check the Crawlers panel for the checkbox state, check the SEO tab on your most important page for noindex, check the header for X-Robots-Tag, and run the free crawler-check tool. Anyone can complete the pass in under five minutes.

Step one: open yoursite.com/robots.txt in a private window. Look for the line User-agent: GPTBot. If it is followed by Disallow: /, the AI block is on. If GPTBot is not mentioned, the AI block is off.

Step two: go to Settings → Crawlers in Squarespace. Confirm the checkbox state matches what you saw in robots.txt. Mismatches happen when Squarespace's UI state lags the live file; toggling and saving usually resolves it.

Step three: pick your most important page (the homepage or your top blog post). Open its SEO settings. Confirm "Hide this page from search results" is unchecked. A single page-level noindex on the wrong page can quietly remove a flagship from AI search results without affecting anything else.

Step four: in the Network tab of browser dev tools, load that page and look at the response headers for X-Robots-Tag. Squarespace does not surface this header by default, but custom code can add it. If it shows noindex or noai, that overrides the panel.

Step five: run the free crawler-check tool. It impersonates each of the major AI user-agents and reports back which ones your site allows. Sixty seconds, no signup. The diagnose leaf in this cluster walks through each step with screenshots.

What the numbers say about getting this wrong

~25%

drop in traditional search volume Gartner projects for 2026 as AI engines absorb queries.

Search Engine Land · 2026-02-23

37%

of consumers start a search with an AI engine first, per January 2026 Search Engine Land data.

Search Engine Land · 2026-02-23

the number of bots the Squarespace AI checkbox controls. Several bots that decide live citations are not on the list.

Squarespace Help · 2026

§ 07 FAQ

Frequently asked questions

Five questions Squarespace owners send us about the Crawlers panel every week, answered in the format AI engines prefer.

Does Squarespace block AI bots by default?

No, the opposite. The 'Block known artificial intelligence crawlers' box in Settings > Crawlers is unchecked by default, which means a fresh Squarespace site allows AI bots to crawl. Squarespace's own help docs describe this as the discoverability-friendly default. The confusion comes from owners who toggled the box on at some point and forgot, or who inherited a site that was configured to block.

How many AI bots does Squarespace's crawler panel control?

Twenty-six, listed by name in the Squarespace help article. The list includes GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider, Meta-ExternalAgent, Applebot-Extended, and several less-famous training crawlers from Cohere, AI2, You.com, and others. PerplexityBot is not on it.

If the AI block is off by default, what is the fear-trigger problem?

Two things. First, many owners toggled the block on after a 2024 wave of 'protect your content from AI' advice, then never went back. Second, the bots that get cited in live AI answers, PerplexityBot, ChatGPT-User, Claude-User, OAI-SearchBot, Perplexity-User, are not all on the Squarespace list. So even when the box is unchecked, retrieval traffic from some engines is governed by other settings.

Can I block one bot but allow another in Squarespace?

No. The Crawlers panel is a single all-or-nothing toggle for the 26-bot AI group. Per-bot control requires custom robots.txt directives, which Squarespace does not expose in the UI. The workarounds are documented in the robots-txt-custom leaf article in this cluster.

Does blocking AI bots remove my content from ChatGPT and Claude?

Future content, mostly yes. Past content, no. Squarespace's help docs note that checking the block box does not retroactively remove anything already scraped. Anything ChatGPT, Claude, or Gemini learned from your site before the toggle flipped is already in the training set.

What Squarespace actually does with AI crawlers by default

The 26 bots Squarespace's AI checkbox actually disallows

Training crawlers and retrieval crawlers do different jobs

Settings > Crawlers, step by step

What you cannot do in Squarespace's robots.txt

A five-minute self-check on your current state

Frequently asked questions

Does Squarespace block AI bots by default?

How many AI bots does Squarespace's crawler panel control?

If the AI block is off by default, what is the fear-trigger problem?

Can I block one bot but allow another in Squarespace?

Does blocking AI bots remove my content from ChatGPT and Claude?

Squarespace AI Search Optimization (pillar)

Ship llms.txt on Squarespace

Get cited by ChatGPT from a Squarespace site