§ 01 The short answer

TL;DR — three signals that travel together

Alt text, file name, and ImageObject JSON-LD are the three image-level signals Gemini's grounding pipeline can read. Each one is small on its own; together they give the model enough context to connect a page's visual to its text without ambiguity. On Squarespace, the alt-text and file-name pieces work on every plan; the ImageObject block requires Code Injection (Business plan or above). The discipline is to ship all three on hero images and key illustrations, not on every decorative image on the page.

The work is fastest when batched. Take a pass through your top five editorial pages, set the file name and alt text on the hero image of each, then paste the ImageObject Code Injection block into the page header. The whole pass takes about fifteen minutes per page on the first round and three minutes per page on each subsequent page once you have the JSON-LD template in clipboard.

§ 02 The mechanism

Why Gemini reads images natively, not via OCR

The Gemini Embedding 2 model processes images at the visual level rather than running OCR over them. The practical implication is that the model can connect a hero image's content (a screenshot of the Squarespace AI Visibility panel, say) to a paragraph that references the panel, even though the panel itself contains text that OCR would garble. The multimodal capability raises the importance of alt text and ImageObject schema because those signals are the way the page tells the model what the image is — and the model uses that explicit signal to confirm or correct its visual interpretation.

Google's image understanding documentation² describes the capability directly. Multimodal content moves through the embedding pipeline as a first-class input, alongside the text on the page. When the grounding tool returns search results that include image-rich pages, the model can extract meaningful information from those images even when the visible text on the page does not describe them. Pages that ship descriptive alt text and ImageObject metadata help the model converge on the correct interpretation faster, which raises citation likelihood for queries where visual context matters.

The multimodal layer, in shape

image-level signals that pay: alt text, file name, ImageObject schema. Ship all three on hero images.

Schema.org · 2026

OCR steps in Gemini's multimodal pipeline — images are read at the visual level via Embedding 2.

Google · 2026

Top 5

pages where the multimodal layer earns its install time. Decorative images on inner pages do not need it.

Google · 2026

§ 03 Alt text

Alt text discipline on Squarespace

Squarespace exposes two entry points for alt text: the image-block alt field that appears when you click an image in the editor, and the file caption when an image is uploaded via the Files panel. The image-block alt field is the one Gemini and screen readers read; the file caption is metadata in the Files panel and does not propagate to the live HTML by itself. Set the block-level alt text on every image that carries informational weight. Decorative images (background textures, divider lines) can be left with empty alt — that is the correct accessibility behaviour, not laziness.

The pattern for good alt text is descriptive without being keyword-stuffed. 'Squarespace AI Visibility panel showing five branded prompts and two pending non-branded prompts' carries information about what the image actually shows; 'Squarespace SEO best practices' is keyword bait that adds no information for either the engine or a screen-reader user. Gemini's multimodal layer rewards the former and penalises (or at least ignores) the latter.

A useful test: read your alt text aloud while the image is hidden. If a listener could form a mental picture of what the image shows, the alt text is doing its job. If they could not, the alt text is too vague or too keyword-focused.

§ 04 File names

The file-name pattern that survives the CDN

Squarespace serves images through a CDN that often renames uploaded files to hash-style names — useful for cache busting, less useful for image SEO. The workaround is to set a descriptive file name on your local machine before uploading. The CDN-renamed URL persists for the duration of the file, but the local name survives in the file metadata Squarespace exposes and is sometimes preserved in the served URL when the upload path is direct rather than through the editor's image-picker.

The pattern: hyphen-separated, lowercase, descriptive of the image's actual content, 5-8 words. squarespace-gemini-grounding-pipeline.jpg beats img_2358.jpg by a wide margin. The convention also helps with internal organisation — when you have to locate an image six months later in the Files panel, descriptive names are searchable in a way hash strings are not.

Direct uploads through the Files panel (Settings → Files) preserve the original file name more reliably than editor uploads. For high-value hero images on top pages, upload through the Files panel first and reference the URL from the image block. The extra step costs about thirty seconds per image and produces a CDN URL that retains the descriptive slug.

§ 05 ImageObject schema

ImageObject JSON-LD on hero images

ImageObject is the structured-data type for images. Inject an ImageObject JSON-LD block in Page Settings → Advanced → Code Injection → Header on each page whose hero image carries real informational weight. The block declares the image's URL, caption, creator, and the representativeOfPage flag that tells Google this image stands for the page's content. The pattern pairs with the Article JSON-LD from the freshness leaf — the Article schema's image field references the same URL, creating the text-to-visual entity handshake Gemini's multimodal layer reads.

Schema.org's ImageObject specification³ defines the recommended properties. Five are worth shipping in every block: contentUrl (the actual image URL), caption (a short description matching the alt text), creator (a Person URL, usually the founder), representativeOfPage (boolean true for hero images), and license (a URL to your license terms if applicable, otherwise omit).

JSON-LD ImageObject block for the page hero — paste into Page Settings > Advanced > Code Injection > Header

 <script type="application/ld+json"> { "@context": "https://schema.org", "@type": "ImageObject", "contentUrl": "https://yoursite.com/images/squarespace-gemini-hero.jpg", "caption": "Squarespace AI Visibility panel showing branded and non-branded prompt results", "creator": { "@type": "Person", "name": "Founder Name", "url": "https://yoursite.com/founder/" }, "representativeOfPage": true, "width": "1600", "height": "900" } </script>

Validate via the Rich Results Test after pasting. ImageObject does not trigger a visible rich-result preview on general editorial pages, but the test confirms the JSON parses without errors and the properties are recognised.

§ 06 Captions

Captions and the entity handshake

A visible caption beneath the hero image adds a fourth signal alongside alt text, file name, and ImageObject schema. The caption text reads as visible page content rather than alt metadata, which gives the model an additional anchor for connecting the image to the surrounding text. Squarespace's image block exposes a caption field directly under the image; ship it on the hero image of every editorial page, ideally matching the language of the ImageObject caption property so the two reinforce each other.

The entity handshake is the move worth understanding. When the Article JSON-LD's image field references the same URL as the ImageObject block, when the visible caption matches the schema caption, and when the alt text describes the same thing in user-friendly language, the model receives four converging signals that all point at the same image. That convergence resolves ambiguity at the multimodal pipeline level, which raises the page's selection likelihood when Gemini's grounding needs an image-aware answer.

§ 07 Verification

Verify the multimodal layer is live

Three checks confirm the layer is in place. First, view the page source and search for the ImageObject JSON-LD block — the script tag should be present in the page head. Second, run the page through Google's Rich Results Test and confirm zero errors on the ImageObject and Article blocks. Third, inspect the rendered image element in the browser's developer tools and confirm the alt attribute is populated and the src URL points to a recognisable image path.

Two follow-up checks for the patient owner. Search Google Images for the descriptive caption text in quotes; if the image is indexed, your page should appear in the results within a few weeks. And run a manual Gemini query at gemini.google.com using a question your hero image visually answers; if Gemini cites your page, the multimodal layer is working as intended. Neither check is binary — absence does not prove failure — but presence is a strong signal the install is healthy.

With the multimodal layer in place, the cluster's four layers are complete. Loop back to the cluster hub for the full picture, or jump to the checklist for the 12-item ship list that ties Google-Extended, freshness, multimodal, and section-extractable passages together as one operational sequence.

TL;DR — three signals that travel together

Why Gemini reads images natively, not via OCR

Alt text discipline on Squarespace

The file-name pattern that survives the CDN

ImageObject JSON-LD on hero images

Captions and the entity handshake

Verify the multimodal layer is live

Gemini × Squarespace cluster hub

Gemini freshness — dateModified and the quarterly cadence

AI Overviews E-E-A-T leaf — Person schema and sameAs