PublishedTypeDiagnosticSources7 namedAuthored bySquareRank Team
§ 1.7.4 · Cluster 1G · Diagnostic
Five Live Checks That Tell You Whether AI Can Read Your Squarespace Site
Five files and panels decide whether AI engines can read your site right now: robots.txt, the Squarespace Crawlers panel, the page-level noindex toggle on each page's SEO tab, the X-Robots-Tag HTTP header, and the live response from each major AI bot's user-agent. Any owner can run all five in under five minutes.
This page walks the checks in order, names the expected state in each one for a site that wants AI citations, and ends with a free crawler-check tool that automates the last step. The 90-second version: open robots.txt, look at the Crawlers panel, verify your top page's SEO settings, check the response headers, run the tool.
§01The five
The five checks, in order
The diagnostic moves from the platform layer (robots.txt and the Crawlers panel) down to the page layer (the SEO tab on each page) and finally to the live-fetch layer (the X-Robots-Tag header and the actual crawler audit). Stopping early misses the most common cause of accidental AI blocks, which is a single page-level noindex on a flagship page.
Most Squarespace owners we audit have the panel-level settings correct and the page-level settings wrong. The Squarespace AI checkbox is unchecked by default1, and most owners have not changed it. The page-level "Hide this page from search results" toggle, however, gets flipped by accident — usually when a designer was hiding a draft page and forgot to unhide a sibling. That single mistake takes a flagship page out of every AI surface in addition to Google Search, and it does not show up in robots.txt.
What this diagnostic catches
5
places a single misconfiguration can take a Squarespace site out of AI search.
Check 1: open yoursite.com/robots.txt in a private window
The robots.txt file is the authoritative record of what your Squarespace site is currently telling crawlers. Open it in a private browser window (to bypass any caching), and search for 'User-agent: GPTBot'. If you see Disallow following, the AI block is on. If GPTBot is not mentioned at all, the AI block is off.
The reason for the private window is that Squarespace's UI sometimes lags the live robots.txt by a few minutes after a toggle change. The file is the truth; the panel is the UI's interpretation of the truth.
robots.txtWhat you want to see if the goal is AI citations (excerpt)
# Squarespace's default robots.txt for a site with the AI block OFF.# No AI user-agents are listed. Bots default to allow.User-agent:*Disallow:/configDisallow:/searchDisallow:/account$# ...and other Squarespace system pathsSitemap:https://yoursite.com/sitemap.xml
If you do see User-agent: GPTBot followed by Disallow: / (and the same for the other 25 named bots), the AI block is currently on. That is the most common reason a Squarespace site that "should" be AI-visible is not. The fix is one checkbox in the Crawlers panel.
§03Check 2
Check 2: confirm the Crawlers panel matches the file
Settings > Crawlers in Squarespace. The 'Block known artificial intelligence crawlers' checkbox should match what you saw in robots.txt. If they disagree, save the panel again and refresh. A mismatch is rare but happens after a UI bug or an interrupted save.
For a site that wants AI citations, the recommended state is: search-engine checkbox on (so Googlebot and Bingbot can crawl), AI checkbox off (so the 26-bot AI list remains allowed)3. That is what a fresh Squarespace site ships with, and it is what most sites should stay at.
Mismatches between the panel and the file usually self-resolve. If yours does not within ten minutes of toggling and saving, contact Squarespace support and quote the URL of the robots.txt file you observed.
§04Check 3
Check 3: page-level noindex on your most important pages
Open the page settings on your homepage, your top blog post, and your most-trafficked landing page. Click into the SEO tab. Confirm 'Hide this page from search results' is unchecked. This is the most common accidental block on a Squarespace site, and it does not appear anywhere in robots.txt.
The "Hide this page from search results" toggle adds a <meta name="robots" content="noindex"> tag to the page's head2. All major AI retrieval bots — ChatGPT-User4, Claude-User5, Perplexity-User6 — read meta robots and skip noindex'd pages. A single accidental check on a flagship page removes that page from every AI surface as well as Google.
§05Check 4
Check 4: X-Robots-Tag in the response headers
The X-Robots-Tag HTTP header is the meta robots equivalent that lives at the server response layer. Squarespace does not set it by default, but custom Code Injection can. Open your browser's dev tools, switch to the Network tab, reload the page, click the document request, and look at the response headers. If X-Robots-Tag appears with noindex or noai, that overrides everything else.
The reason this check exists is that custom code injected into a Squarespace site can sometimes inject server-side directives indirectly (for example, through edge workers, reverse proxies, or third-party performance plugins). Those directives do not show up in the visible HTML, and they do not show up in robots.txt. They only show up if you look at the response headers.
For most Squarespace sites, this header is empty or absent, which is the expected state. If it is set and you do not remember setting it, the most likely cause is a third-party CDN or analytics integration that adds it for caching or privacy reasons. Trace it back to the integration and decide whether to keep it.
§06Check 5
Check 5: run the free crawler-check tool
The final check is the automated one. The free crawler-check tool at /tools/crawler-check/ impersonates each major AI bot's user-agent string, requests your homepage, and reports back which ones the site allows. It takes about sixty seconds and requires no signup. The output names every bot that gets a 200 and every bot that gets a 403 or a redirect.
The tool covers the documented user-agents from OpenAI4, Anthropic5, Perplexity6, Apple, Google, Meta, ByteDance, Common Crawl, and Mistral. It does not impersonate the stealth user-agents Cloudflare reported on in August 2025; those are a different problem that no site-level tool can audit reliably.
For a site whose goal is AI citations, the desired output is "200 from every retrieval bot, 200 from every search-index bot, owner's call on the training bots". If you see 403s from ChatGPT-User, Claude-User, or Perplexity-User, something in your robots.txt, meta robots, X-Robots-Tag, or page-level noindex is blocking them, and one of the earlier checks should have told you which one.
§07Interpret
Interpreting the result
If all five checks return the expected state, the site is AI-visible at the configuration layer. That is necessary, not sufficient. Content quality, schema, internal linking, and entity wiring decide whether AI engines cite you, not just whether they can read you. The five-check pass clears the floor; the rest of the pillar covers the ceiling.
A clean pass on this diagnostic puts you in the same state as roughly forty percent of the Squarespace sites we audit, which is the baseline. Citation visibility from that baseline depends on the content layer: the 134-167 word self-contained answer block per H2, named-source citation density, dated claims, founder-entity Person schema, and llms.txt where it adds value7.
The pillar covers the full ceiling, the llms.txt cluster covers the Squarespace-specific workaround for that file, and the AI Crawlers hub covers everything else in this cluster. If you'd rather skip the manual work and have the install done for you, the SquareRank install completes the configuration layer plus the content and schema layer in seven business days for $299, refundable for fourteen.