We Studied 18,000 AI-Cited Pages Across 5 Platforms
If you've spent any time trying to figure out how to get your content cited by AI, you've probably noticed something: everyone has advice, and almost none of it is backed by real data. Most of it treats every recommendation as equally important. Should you prioritize schema markup or heading structure? Does word count matter more than freshness? The answer depends on what the data shows, and very little of that advice is grounded in it. We get it. The frustration is real. The entire GEO industry sprung up overnight, and most of it is built on best guesses dressed up as certainty. We saw the same thing and decided to do something about it. Over the past several months, we built a calibration system that collects actual citations from real AI platforms, extracts and analyzes the HTML of the pages that get cited, compares them against similar pages that don't get cited, and measures what's statistically different. Not what we think should matter. Not what a checklist says. What the data actually shows. We're publishing everything we found — the numbers, the methodology, the limitations, and the honest admission that we still don't know most of what drives AI citation. Here's the short version.
What We Did
We sent 500 prompts across 12 topic categories to 5 AI surfaces. We collected 25,115 citations pointing to 18,129 unique pages. We extracted 213 structural signals from every page and measured what statistically differentiates cited from non-cited pages.
Key Finding: Domain Authority Matters Most
Domain factors account for 77% of predictive importance. Page factors account for 23%. Page optimization compounds with authority — it doesn't substitute for it.
Key Finding: Basic HTML Structure Matters
The signals that most consistently differentiate cited from non-cited pages are the fundamentals: proper doctype, language declaration, canonical tag, viewport meta tag, and meta description.