§ 01 The five

The five checks, in order

The diagnostic moves from the platform layer (robots.txt and the Crawlers panel) down to the page layer (the SEO tab on each page) and finally to the live-fetch layer (the X-Robots-Tag header and the actual crawler audit). Stopping early misses the most common cause of accidental AI blocks, which is a single page-level noindex on a flagship page.

Most Squarespace owners we audit have the panel-level settings correct and the page-level settings wrong. The Squarespace AI checkbox is unchecked by default¹, and most owners have not changed it. The page-level "Hide this page from search results" toggle, however, gets flipped by accident — usually when a designer was hiding a draft page and forgot to unhide a sibling. That single mistake takes a flagship page out of every AI surface in addition to Google Search, and it does not show up in robots.txt.

What this diagnostic catches

places a single misconfiguration can take a Squarespace site out of AI search.

Squarespace Help · 2026

of the five (page-level noindex) is the most common cause of an accidental AI block.

Squarespace Help · 2026

seconds is the typical run-time of the automated crawler audit at the end of this pass.

Search Engine Land · 2026-02-23

§ 02 Check 1

Check 1: open yoursite.com/robots.txt in a private window

The robots.txt file is the authoritative record of what your Squarespace site is currently telling crawlers. Open it in a private browser window (to bypass any caching), and search for 'User-agent: GPTBot'. If you see Disallow following, the AI block is on. If GPTBot is not mentioned at all, the AI block is off.

The reason for the private window is that Squarespace's UI sometimes lags the live robots.txt by a few minutes after a toggle change. The file is the truth; the panel is the UI's interpretation of the truth.

robots.txt What you want to see if the goal is AI citations (excerpt)

 # Squarespace's default robots.txt for a site with the AI block OFF. # No AI user-agents are listed. Bots default to allow. User-agent: * Disallow: /config Disallow: /search Disallow: /account$ # ...and other Squarespace system paths Sitemap: https://yoursite.com/sitemap.xml

If you do see User-agent: GPTBot followed by Disallow: / (and the same for the other 25 named bots), the AI block is currently on. That is the most common reason a Squarespace site that "should" be AI-visible is not. The fix is one checkbox in the Crawlers panel.

§ 03 Check 2

Check 2: confirm the Crawlers panel matches the file

Settings > Crawlers in Squarespace. The 'Block known artificial intelligence crawlers' checkbox should match what you saw in robots.txt. If they disagree, save the panel again and refresh. A mismatch is rare but happens after a UI bug or an interrupted save.

For a site that wants AI citations, the recommended state is: search-engine checkbox on (so Googlebot and Bingbot can crawl), AI checkbox off (so the 26-bot AI list remains allowed)³. That is what a fresh Squarespace site ships with, and it is what most sites should stay at.

Mismatches between the panel and the file usually self-resolve. If yours does not within ten minutes of toggling and saving, contact Squarespace support and quote the URL of the robots.txt file you observed.

§ 04 Check 3

Check 3: page-level noindex on your most important pages

Open the page settings on your homepage, your top blog post, and your most-trafficked landing page. Click into the SEO tab. Confirm 'Hide this page from search results' is unchecked. This is the most common accidental block on a Squarespace site, and it does not appear anywhere in robots.txt.

The "Hide this page from search results" toggle adds a <meta name="robots" content="noindex"> tag to the page's head². All major AI retrieval bots — ChatGPT-User⁴, Claude-User⁵, Perplexity-User⁶ — read meta robots and skip noindex'd pages. A single accidental check on a flagship page removes that page from every AI surface as well as Google.

§ 05 Check 4

Check 4: X-Robots-Tag in the response headers

The X-Robots-Tag HTTP header is the meta robots equivalent that lives at the server response layer. Squarespace does not set it by default, but custom Code Injection can. Open your browser's dev tools, switch to the Network tab, reload the page, click the document request, and look at the response headers. If X-Robots-Tag appears with noindex or noai, that overrides everything else.

The reason this check exists is that custom code injected into a Squarespace site can sometimes inject server-side directives indirectly (for example, through edge workers, reverse proxies, or third-party performance plugins). Those directives do not show up in the visible HTML, and they do not show up in robots.txt. They only show up if you look at the response headers.

For most Squarespace sites, this header is empty or absent, which is the expected state. If it is set and you do not remember setting it, the most likely cause is a third-party CDN or analytics integration that adds it for caching or privacy reasons. Trace it back to the integration and decide whether to keep it.

§ 06 Check 5

Check 5: run the free crawler-check tool

The final check is the automated one. The free crawler-check tool at /tools/crawler-check/ impersonates each major AI bot's user-agent string, requests your homepage, and reports back which ones the site allows. It takes about sixty seconds and requires no signup. The output names every bot that gets a 200 and every bot that gets a 403 or a redirect.

The tool covers the documented user-agents from OpenAI⁴, Anthropic⁵, Perplexity⁶, Apple, Google, Meta, ByteDance, Common Crawl, and Mistral. It does not impersonate the stealth user-agents Cloudflare reported on in August 2025; those are a different problem that no site-level tool can audit reliably.

For a site whose goal is AI citations, the desired output is "200 from every retrieval bot, 200 from every search-index bot, owner's call on the training bots". If you see 403s from ChatGPT-User, Claude-User, or Perplexity-User, something in your robots.txt, meta robots, X-Robots-Tag, or page-level noindex is blocking them, and one of the earlier checks should have told you which one.

§ 07 Interpret

Interpreting the result

If all five checks return the expected state, the site is AI-visible at the configuration layer. That is necessary, not sufficient. Content quality, schema, internal linking, and entity wiring decide whether AI engines cite you, not just whether they can read you. The five-check pass clears the floor; the rest of the pillar covers the ceiling.

A clean pass on this diagnostic puts you in the same state as roughly forty percent of the Squarespace sites we audit, which is the baseline. Citation visibility from that baseline depends on the content layer: the 134-167 word self-contained answer block per H2, named-source citation density, dated claims, founder-entity Person schema, and llms.txt where it adds value⁷.

The pillar covers the full ceiling, the llms.txt cluster covers the Squarespace-specific workaround for that file, and the AI Crawlers hub covers everything else in this cluster. If you'd rather skip the manual work and have the install done for you, the SquareRank install completes the configuration layer plus the content and schema layer in seven business days for $299, refundable for fourteen.

The five checks, in order

Check 1: open yoursite.com/robots.txt in a private window

Check 2: confirm the Crawlers panel matches the file

Check 3: page-level noindex on your most important pages

Check 4: X-Robots-Tag in the response headers

Check 5: run the free crawler-check tool

Interpreting the result

AI Crawlers hub

Every bot on Squarespace's list, named

Squarespace AI Search Optimization (pillar)