PublishedVerifiedEvery 6 weeksSources4 namedAuthored bySquareRank Team
Gemini · § 1.4.1 · How-to
Google-Extended on Squarespace, Explained Honestly
Google-Extended is the training opt-out token for future Gemini models and Vertex AI grounding development — nothing more. Google's own documentation states verbatim that it does not impact a site's inclusion in Google Search nor is it used as a ranking signal1. On Squarespace, the token is one of 26 bots controlled by the AI exclusion checkbox at Settings → Crawlers, sitting at entry #14 alphabetically2. The decision is about values, not visibility.
This leaf walks through what Google-Extended actually controls, where the Squarespace checkbox lives, and the short decision tree most owners apply in about ninety seconds. It also covers the verification step — checking robots.txt to confirm the posture took effect — because the checkbox UI does not show you the underlying rule it writes.
§01The short answer
TL;DR — block it or allow it, your Gemini citations look the same either way
The Google-Extended decision is small. The token exists to give publishers a single robots.txt directive that opts content out of training future Gemini models and the grounding development inside Vertex AI. It is not the bot that fetches pages for live Gemini answers — that is Googlebot, the same crawler that runs classical Search. So the practical Gemini-visibility impact of toggling Google-Extended on a Squarespace site is zero. The decision is whether you want your editorial work in Google's training corpus, which is a values question rather than an SEO question.
Most Squarespace owners discover the checkbox the same way: they read a 2024-era blog post telling them to block AI bots, panic-flip the AI exclusion switch, and then quietly worry they have torpedoed their AI-search visibility. The honest answer is that the switch controls 26 bots in one go, most of which are training crawlers; flipping it is not the disaster the worried blog posts suggest, and it is not the magic visibility lift the breathless ones suggest either. Read the canonical quote in the next section, decide, move on.
§02What Google-Extended is
What Google-Extended actually controls
Google-Extended is a standalone robots.txt token introduced to give publishers a separate, granular control for AI training without affecting Search inclusion. The token has no dedicated HTTP user-agent; Google's existing crawlers carry the page, and the Google-Extended directive in robots.txt tells Google whether the fetched content can be used to train future Gemini models and to develop grounding inside Vertex AI for Gemini. Crucially, the token does not govern live Gemini answers today. Live Gemini grounding reads Google Search results, and the crawler that populates Google Search results is Googlebot.
The architectural reason matters. When Google decided to give publishers a training opt-out, the team had two choices: build a new crawler with its own user-agent, or layer a directive on top of the existing crawl. They chose the second route, which means Google-Extended is a robots.txt control surface rather than a crawler in the traditional sense. The practical effect is identical — your robots.txt can disallow Google-Extended and Google will respect that — but the mental model is different. You are not blocking a new bot from your site; you are flagging your existing content as off-limits for one specific downstream use.
Gemini's grounding documentation3 confirms the architecture. The grounding tool fires Google Search queries, reads the results, and synthesises an answer with inline citations. The pipeline does not include a separate Gemini crawler step. That is why the Google-Extended posture is a training decision, not a citation decision.
The Google-Extended math
0
ranking-signal effect on Google Search from blocking Google-Extended, per Google's own documentation.
One sentence in Google's documentation removes most of the anxiety this token generates. The page on Google common crawlers states directly: 'Google-Extended does not impact a site's inclusion in Google Search nor is it used as a ranking signal in Google Search.' That sentence is the answer to almost every Squarespace forum thread about the AI exclusion checkbox. If your worry is that blocking Google-Extended hurts Search rankings or AI Overviews appearance or Gemini citations, the worry is unfounded according to the source operating the system.
The sentence is worth keeping on hand because the surrounding noise is loud. Multiple 2024 blog posts treat the checkbox as a critical SEO decision in either direction — turn it on and lose visibility, turn it off and let Google steal your content. Neither framing matches the documentation. The token exists precisely so publishers can make a granular training decision without that decision spilling into Search ranking. Hold the quote against any contradicting source and the contradicting source is the one to discount.
§04The Squarespace UI
Where the checkbox lives on a Squarespace site
Open the Squarespace dashboard. Navigate to Settings → Crawlers. The first checkbox controls search-engine crawlers (Googlebot, Bingbot, and friends) and should be on. The second checkbox is labelled 'Block known artificial intelligence crawlers' and that is the one that controls Google-Extended along with 25 other named bots in a single toggle. Default state on a new site is unchecked, meaning all 26 bots are allowed. Toggling it on writes Disallow rules for all 26 bots into the live robots.txt; toggling it off removes them.
The granularity is the awkward part. Squarespace ships one switch for all 26 bots rather than per-bot toggles, so you cannot block Google-Extended while continuing to allow GPTBot, for example. The full 26-bot list2 includes Google-Extended at entry #14 alphabetically, alongside GPTBot, ClaudeBot, CCBot, Bytespider, FacebookBot, and others. The sister cluster on AI Crawlers covers the per-bot map in detail; this leaf covers the Google-Extended decision specifically.
§05The decision
Should you turn it on? A short decision tree
The decision is not a visibility question because Google's documentation already answered that. It is a values question with two reasonable answers. Allow Google-Extended (the default Squarespace ships) if you want your content available for training future Gemini models and grounding development. Block Google-Extended if you would rather your content stayed out of those training corpora. There is no third answer that gives you a Gemini citation boost in 2026, because the token does not gate Gemini citation eligibility.
Three sub-questions resolve the choice for most owners.
Is your editorial output your product? If you sell research, analysis, or proprietary how-to content as the primary revenue line, the case for blocking Google-Extended is stronger. Treat the training opt-out the way news publishers do.
Are you a service business using content to drive leads? If the editorial work is a top-of-funnel layer for a service business (the SquareRank case), allowing Google-Extended is usually fine. Future Gemini citations may surface your brand indirectly; the marginal training risk is small.
Do you publish original primary research? If yes, you may want a per-claim approach — allow the foundational educational content, block the cornerstone research. Squarespace's single-checkbox UI cannot ship that granularity, which is a real platform limitation; document the workaround in the AI Crawlers cluster.
§06Verification
Verify the posture took effect
The Squarespace UI does not show you the live robots.txt content, which means the checkbox-to-rule mapping has to be verified manually. Open a private browser window. Visit yoursite.com/robots.txt. If you ticked the box, expect to see a Disallow rule for User-agent: Google-Extended (alongside 25 similar rules for the other AI bots). If you left it unchecked, no such rule should appear. The check takes ten seconds and removes any ambiguity about whether the posture is actually live.
Two gotchas worth noting. First, Squarespace caches robots.txt aggressively; if you just toggled the box, give it five to ten minutes before checking. Second, robots.txt is a request, not an enforcement mechanism — Google has historically respected the Google-Extended directive, but the directive itself is voluntary on the crawler side. The behaviour to verify is that Squarespace wrote the rule, not that Google itself stops crawling; the latter you cannot directly observe.
bashSixty-second robots.txt verification
# Fetch the live robots.txt
curl -shttps://yoursite.com/robots.txt| grep -i"google-extended"# Expected if AI checkbox is on:# User-agent: Google-Extended# Disallow: /# Expected if AI checkbox is off (Squarespace default):# No output — no Google-Extended-specific rule exists.
With the posture verified, you can return to the cluster hub and move on to the three layers that actually move Gemini citation behaviour: freshness, multimodal assets, and section-extractable passages. The Google-Extended decision is small. The other three layers are where the work lives.