What On-Page Signals Actually Differentiate AI-Cited Pages

We analyzed 18,129 pages cited by AI and 4,622 control pages, extracting 213 structural signals from each.

The 10 Scoring Composites, Ranked

Crawl & Index Signals leads at 14.8% weight (d=0.158), followed by RAG Retrieval Suitability at 13.0%. The distribution is remarkably flat — no single dimension dominates.

Strongest Individual Signals

Lexical diversity (d=0.268) is the strongest non-behavioral signal. Cited pages use more varied vocabulary, have shorter paragraphs, and are more likely to have basic HTML hygiene.