The Schema-Match Audit That Lifted Our AI Overview Citation Share From Zero to 23 Percent in 90 Days

Authored by: Kartik Chugh

When I pulled our weekly AI Overview citation report for redditapis.com in March 2026, the chart sat at zero across every target query. The site ranked top five on Google organic for every one of those queries. It had been indexed for months. Perplexity, ChatGPT search, and Google AI Overview were all citing other sources for the same questions. The mismatch between organic rank and AI Overview citation share was big enough that I assumed the data pull was broken. It was not.

The audit that followed broke into a sequence I had not seen written down anywhere, and it produced a 5-tier ladder we now ship for every cohort client.

The first tier was a schema-match problem we owned

The FAQs on redditapis.com lived in WordPress accordion blocks. We had ARTICLES marked up with Article schema. We had Organization, WebSite, BreadcrumbList schemas in the head. What we did not have was FAQPage schema on any of the accordion blocks, even though the questions inside were written exactly the way a user would phrase them to ChatGPT.

The deeper bug was a schema-to-content mismatch I had not anticipated. On the pages that did have FAQPage schema (we had retro-fitted a handful), the question text in the markup said "what is the Reddit API" and the voice-query phrasing we wanted to match in AI Overview was "how do I use the Reddit API for X." Same topic, completely different question stem. Models pull Q-shaped chunks by exact stem match in the first 80 characters of the answer. We were matching nothing.

The fix was mechanical. I rebuilt every FAQ block so the question stem in the markup was the exact phrasing pulled from Google's People Also Ask, Search Console queries, and Perplexity follow-up suggestions. Direct answer in the first 80 characters of the response. Everything after became context.

What I did not understand at rollout was that the first 80 characters is functionally the chunk window. Everything else is for ranking, not for citation. Once we stopped front-loading filler, the citation share moved from zero to 8 percent inside 30 days on the pages we had restructured.

The second tier was a claims registry we should have built earlier

Around week 5 of the audit I noticed a pattern. AI Overview kept citing a competitor for a query where our content had the same number. Their content was 600 words and a thinner page. Ours was 2400 words and a deeper page. The model preferred theirs.

Reading both side by side, the difference was named source attribution. They wrote "according to our 2025 customer survey of 412 SaaS founders, X." We wrote "X." Both were the same data, theirs got the citation because the model could attach a verifiable source to the claim.

We built a claims registry in Notion. Every numeric claim we publish gets a registry entry with source name, methodology, sample size, measurement window, and a runbook URL. Every published piece links the registry entry inline. The bug was that we had been treating our internal data as obvious. The model needs it to be cited, not obvious.

The eight pages we re-linked to the registry over 2 weeks lifted citation share another 11 percentage points by week 8. Across 4 operators on our team we ran the registry build in parallel to compress the rollout window.

The third tier was a robots.txt block I would not have looked for

Around week 7 I checked our robots.txt to confirm GPTBot was allowed. It was not. A security plugin our previous dev team installed in 2023 had a default blocklist that included GPTBot, Claude-Web, and PerplexityBot. The plugin had updated itself silently three times since then and was actively maintaining the blocklist as a feature.

I would not have caught this without running a curl on /robots.txt from the user agent each bot reports. The Search Console crawl report did not surface it because Search Console only reports on Googlebot's view, not the AI crawler bots.

I removed the plugin, added the seven-bot allowlist canon we now ship (GPTBot, Google-Extended, Claude-Web, PerplexityBot, OAI-SearchBot, ChatGPT-User, anthropic-ai), and waited 14 days for the bots to re-crawl. Citation share lifted another 6 percentage points across the queries the bots had been silently blocked from for over a year.

The fourth tier surfaced an entity-graph error we had not noticed

A query came in to support around week 9 from a customer who said ChatGPT had recommended a different product when he asked for "the best Reddit API." The product the model recommended was a competitor with a similarly-spelled brand name. I ran the brand-disambiguation audit and found ChatGPT had merged the two entities in its knowledge graph.

The fix was the sameAs anchor pattern. Every author bio, our company page, the press kit got a Person or Organization schema with sameAs entries pointing to Wikidata, Crunchbase, LinkedIn, G2, and our Featured.com profile. The model resolved the entity correctly after the next index rebuild.

This one moved citation share another 8 percentage points by week 11. The drift had been quietly leaking for roughly 18 months. Stopping the leak was a step-change, not a gradual climb.

The fifth tier was the compounding loop I had been ignoring

At week 12 we sat at 22 percent citation share across target queries. The fifth lift came from a structural pattern, not a single fix. Every pillar piece on redditapis.com now gets 3 to 5 adjacent pieces inside the property that cite it, plus at least one external high-authority placement (a Featured.com byline, a guest post on a relevant SaaS publication, an inclusion in a tools-list roundup). The internal cluster signals topical authority. The external placement signals editorial trust. The combination treats the page as a citation target the model can rely on.

What I would not do again is treat citation share as a content-quality problem. Quality is necessary, not sufficient. Without structural-data parity, entity-graph hygiene, and reciprocal citation density, the best content stays invisible. The first tier moves the needle 10x faster than another 5000-word evergreen guide.

The two patterns every linking team should run

I would not measure AI Overview citation share as a rank-track metric. It is a citation-density metric tied to entity recognition. Set up your tracking around weekly citation appearances per target query, tagged by which tier was last shipped, with lift per tier computed on a 30-day window post-ship. Without the tier tag you cannot diagnose which install actually moved the share.

I would not skip the robots.txt audit before the claims registry. We did, and lost about six weeks of rework on the registry because the bots could not crawl the pages we were re-anchoring. Sequence matters: schema match first, robots audit second, claims registry third, entity disambiguation fourth, citation density fifth.

What this generalizes to for any content team

The traditional link-building playbook still works for organic ranking. The citation layer is a separate compounding asset. Every operator I audit in 2026 either has AI Overview citation share as a tracked KPI on the CMO dashboard or is about to discover they are losing share to a competitor who does. The 5-tier ladder is the install sequence that took redditapis.com from invisible to dominant in 90 days. The pattern reproduces on every site I have run it on since.