We Studied 18,000 AI-Cited Pages Across 5 Platforms

By Andrew Coffey · 2026-03-24

If you've spent any time trying to figure out how to get your content cited by AI, you've probably noticed something: everyone has advice, and almost none of it is backed by real data. Most of it treats every recommendation as equally important. Should you prioritize schema markup or heading structure? Does word count matter more than freshness? The answer depends on what the data shows, and very little of that advice is grounded in it. We get it. The frustration is real. The entire GEO industry sprung up overnight, and most of it is built on best guesses dressed up as certainty. We saw the same thing and decided to do something about it. Over the past several months, we built a calibration system that collects actual citations from real AI platforms, extracts and analyzes the HTML of the pages that get cited, compares them against similar pages that don't get cited, and measures what's statistically different. Not what we think should matter. Not what a checklist says. What the data actually shows. We're publishing everything we found — the numbers, the methodology, the limitations, and the honest admission that we still don't know most of what drives AI citation. Here's the short version.

What We Did

We sent 500 prompts across 12 topic categories to 5 AI surfaces: ChatGPT, Claude, Gemini, Perplexity, and Google AI Overviews. We collected 25,115 citations pointing to 18,129 unique pages. For comparison, we pulled 4,622 control pages from Brave Search — pages that rank well for the same queries but weren't cited by any AI model. We extracted 213 structural signals from the HTML of every page. We also analyzed 2,000 domains (1,000 cited, 1,000 control) for authority metrics like referring domains, backlink diversity, and organic search visibility. Then we combined everything — page-level signals and domain-level signals — in a single analysis to see how they interact. We publish the Indexably Method — our statistical methods, our findings, and our limitations — in full so you can evaluate them for yourself.

The 6 Things We're Most Confident About

1. Domain authority matters far more than page optimization. Our logistic regression combining page and domain signals shows domain-level factors account for 77% of predictive importance. Page factors account for 23%. This aligns with what Ahrefs, SE Ranking, and multiple other studies have found from different angles. When we say "domain-level factors," we mean the signals we measured from DataForSEO — referring domains, backlink diversity, organic keyword coverage, and Domain Rank. These are likely proxies for something broader: overall entity authority, which includes brand recognition, topical association, and trust built across the web. Our data measures the footprint of that authority, not its full scope. But the direction is clear — what exists outside your pages matters far more than what's on them. 2. Page optimization only produces measurable lift for high-authority domains. We split pages into four groups by domain authority. Page-level signals only show positive effects in the top quartile (d=0.24-0.37). For everyone else, the effects are flat or slightly negative. Page optimization compounds with authority — it doesn't substitute for it. 3. Backlink diversity beats backlink volume. Referring subnets (unique network blocks linking to you) are more than twice as predictive of citation (d=0.513) as raw referring domain count (d=0.235). Total backlink count (d=0.098) barely registers. It's not how many links you have. It's how many independent corners of the internet endorse you. 4. Basic HTML structure is the strongest page-level differentiator. Not schema markup. Not FAQ format. Not word count. The signals that most consistently differentiate cited from non-cited pages are the fundamentals: having a proper doctype, declaring your language, including a canonical tag, having a viewport meta tag, and writing a meta description. Boring stuff. But it's what the data shows. 5. Each AI model has distinct preferences. ChatGPT prioritizes freshness and structured metadata. Claude spreads weight evenly across all factors. Gemini values crawlability most. Google AI Overviews is the most balanced. Optimizing for one model may not help with another. 6. Cited pages aren't longer — they're better structured. Word count shows a slight negative correlation with citation. Cited pages have higher vocabulary diversity, shorter paragraphs, and shorter maximum sections. The pattern is "write well and structure it clearly," not "write more."

What We Don't Know

We want to be honest about the limits of what we found, because the GEO industry has too much false certainty already. We don't know what causes citation. Everything in our data is correlation. We measured what cited pages look like compared to non-cited pages. We didn't prove that adding a canonical tag will get you cited. It might just be that well-maintained sites — which happen to have canonical tags — are also well-known, and it's the being well-known that matters. Most of what drives citation isn't in our data. Our combined model of page and domain factors shows a McFadden pseudo R² of 0.10. That's a meaningful improvement over random guessing, but it means the vast majority of what determines whether AI cites your page is something we didn't measure. The most likely candidate: whether your content actually answers the specific question someone asked. That's a content relevance question, not a structural one. We don't know if these findings are stable. AI systems update constantly. We plan to re-run calibration periodically, but right now we have one snapshot in time. Semrush found that 40-60% of cited sources rotate monthly. The landscape moves fast. We don't know if local businesses follow the same patterns. Our data aggregates all 12 query categories together. Local service queries might follow fundamentally different rules. We haven't tested that yet. We don't know if improving these signals actually changes outcomes. That's the causation gap. The Princeton GEO paper (KDD 2024) showed that adding statistics to pages improved visibility by 22-41% in controlled experiments. Our observational data is consistent with their findings, but we haven't run our own intervention experiments yet. We'll keep working on all of this, and we'll keep publishing what we find.

The Deep Dives

We've broken the full findings into four focused posts. Each one stands alone, but together they tell the complete story. Post 1: What On-Page Signals Actually Differentiate AI-Cited Pages — The page-level data. 213 signals measured across 18,129 cited pages and 4,622 controls. Which composites matter most, which individual signals are strongest, which signals actually hurt your chances, and what the JS vs non-JS rendering split reveals about what AI crawlers actually see. If you want to know what to fix on your pages, start here. Post 2: Domain Authority vs Page Optimization — The Numbers — The domain-level data and the combined analysis. DataForSEO Rank at d=1.075 is 7x stronger than any page signal. Referring subnets beat referring domains. The logistic regression shows 77/23 domain vs page. And the honest framing of what this means for sites at different authority levels — including the uncomfortable truth that page optimization doesn't measurably help most sites. Post 3: The Complete Ranking — Every Factor From Strongest to Weakest — All 14 metrics (10 page + 4 domain) ranked together in one unified table. The quartile analysis showing exactly how page optimization interacts with domain authority at each level. The within-domain comparison that holds authority perfectly constant and isolates pure page-level effects. This is the post for people who want the full picture in one place. Post 4: How 5 AI Platforms Cite Differently — And What To Do About It — Per-model weight breakdowns for ChatGPT, Claude, Gemini, Perplexity, and Google AI Overviews. Which platforms favor which factors. Why optimizing for ChatGPT might not help with Perplexity. And the volume-weighted approach we use to balance across all of them. This is the post for people who care about specific platforms.

The Indexably Method

For the complete pipeline, statistical methods, formulas, and data tables, see the full Indexably Method documentation. It covers everything: how we collect citations, how we build the control set, how Cohen's d works, how composites are scored, how weights are derived, how the logistic regression was structured, and every caveat we can think of. It's long and detailed. It's meant to be.

Why We Built This

Indexably started because we wanted to know what actually drives AI citation — and we couldn't find the answer published anywhere. So we built a system to measure it ourselves. We decided to do it differently. We built the Indexably Method — a calibration system that collects real citations, compares cited pages against non-cited pages, and measures what's statistically different. We're publishing the weights, the effect sizes, the per-model breakdowns, and the limitations. When Indexably scores your page, the weights come from this data — not from a checklist someone assembled from blog posts. Does that mean we've cracked the code of AI citation? No. The data is very clear that we haven't. Domain authority explains far more than anything we can score on your page. And most of what drives citation is probably content relevance — whether your page answers the right question — which no tool can measure from the outside. But within the factors you can control and we can measure, we now know which ones make the most difference, which ones are noise, and which ones change depending on the AI platform. That's what Indexably's scores are built on. And every time we run a new calibration, the scores get better.

A Note on Sharing This

We publish the Indexably Method openly. If it helps someone build a tool, run their own research, or make better decisions for their clients — that's a good outcome. All we'd ask is that if the Indexably Method helps your work — whether you're building a tool, writing a study, or advising clients — a mention or link back goes a long way. Not because we're owed it, but because it helps us keep doing this kind of work and sharing it openly.