How It Works

We analyze your content the way AI systems actually process it — not the way traditional search engines do.

Step 1: We crawl like AI does

You choose your rendering mode: JavaScript disabled, the way ChatGPT, Claude, and Perplexity typically see your page — or JavaScript enabled, closer to how Gemini and Bing Copilot see it. Most sites look different depending on which mode you use. That difference matters, and we think you should be able to see it.

Step 2: We extract like AI does

Before any scoring happens, we pull your content in a structured way — metadata, schema, headings, body text — prioritized in the order retrieval systems tend to value it. We also look at how your content breaks apart at the paragraph level, because AI systems don't evaluate whole pages. They work with pieces.

Step 3: We score against a structured framework

Ten metrics for pages, six categories for sites, organized around the three points where AI citation tends to break down: whether AI can find your content, whether it can read it, and whether it would actually cite it. Every metric comes with a reason explaining the score — not just a number.

Step 4: Our page scoring weights come from real citation data

We collected 25,115 citations from 5 AI platforms — ChatGPT, Claude, Gemini, Perplexity, and Google AI Overviews — across 500 prompts in 12 topic categories. We then fetched and analyzed the publicly accessible HTML of 18,129 cited pages and 4,622 matched control pages (pages that rank well in search for the same queries but weren't cited by any AI model), respecting robots.txt and standard crawling conventions. We extracted 213 structural signals from each page and measured which signals statistically differentiate cited from non-cited pages using Cohen's d effect sizes. The weight assigned to each of our 10 page metrics reflects how strongly that metric differentiates cited pages from non-cited pages in our data. Crawl & Index Signals carries the highest weight (14.8%) because basic HTML fundamentals — doctype, lang attribute, canonical tag, meta description — are the most consistent differentiator. Domain Expertise carries the lowest weight (6.3%) because most authority signal comes from domain-level factors outside the page itself. We produce separate weight sets for JS and non-JS rendering modes, because the signals AI crawlers see differ depending on whether they execute JavaScript.

Step 5: Site scans combine content analysis with domain authority data

For full-site scans, we evaluate your site's content and structure across 6 categories (Site Structure, Topic Coverage, Internal Linking, Content Distribution, AI Discoverability, Technical Readability) using AI analysis of your crawled pages. Each category is weighted equally in your overall site score. We want to be upfront about why: the Indexably Method empirically calibrated page-level weights by comparing cited vs non-cited pages across 5 AI platforms. We haven't done the same for site-level categories. We don't have a dataset of "cited sites vs non-cited sites" scored across these 6 dimensions with enough statistical power to derive meaningful weights. Rather than make up numbers that imply precision we don't have, we weight them equally. If we're able to calibrate site-level weights in the future, we will. What we do ground in real data is the domain authority context. We pull your domain's authority data from DataForSEO's backlink and search visibility APIs — referring domain count, backlink diversity (referring subnets), organic keyword coverage, and estimated traffic value. This data comes from the same source we used in the Indexably Method, where we analyzed 2,000 domains and found that domain-level authority accounts for roughly 77% of what predicts AI citation. Your Domain Signals card shows where you stand relative to those benchmarks.

Step 6: Your recommendations are grounded in what cited pages actually look like

When we recommend improvements, we don't guess at what matters. We reference specific benchmarks from our citation data — for example, that 84% of cited pages declare their HTML language, or that cited pages average 121 words between headings. When your page falls below a benchmark, we tell you the specific gap. For site scans, your domain authority data shapes how we prioritize recommendations. Our data shows that page and site optimization produces measurable citation lift for high-authority domains, but has less measurable impact for smaller domains. If your domain authority is below the threshold where site optimization produces results, we'll tell you that honestly and focus your recommendations on building foundations that compound as your authority grows.

Step 7: We recalibrate as AI behavior changes

AI systems evolve, and what they favor today may shift. We periodically recalibrate by running new citation collection across all major AI platforms and recomputing our weights. This means your scores reflect current AI behavior, not a snapshot from when we launched.

Where Our Weights Come From

Our weights are not assumptions. They're derived from a specific study: 500 prompts across 12 query categories (medical, financial, technical, local, e-commerce, and more). 5 AI platforms with web search enabled (ChatGPT, Claude, Gemini, Perplexity, Google AI Overviews). 25,115 citations collected, pointing to 18,129 unique pages. 4,622 matched control pages from Brave Search — pages ranking well for the same queries but not cited by any AI model. 213 structural signals extracted from every page's HTML. Cohen's d effect sizes measuring which signals statistically differentiate cited from non-cited pages. Volume-weighted cross-model analysis so platforms producing more citation data have proportionally more influence. Each of our 10 page metrics is a composite of related signals. The weight reflects how strongly that composite differentiates cited from non-cited pages in the data. A higher weight means that metric is a stronger differentiator.

Page-Level Composite Weights

Volume-weighted non-JS — the production weight set used for scoring. 1. Crawl & Index Signals: 14.8%. 2. RAG Retrieval Suitability: 13.0%. 3. Content Relevance: 12.6%. 4. Engagement Cues: 12.5%. 5. Multimodal Readiness: 11.4%. 6. AI Readability: 8.8%. 7. Citation Suitability: 7.2%. 8. Structured Metadata: 6.7%. 9. Page Freshness: 6.7%. 10. Domain Expertise: 6.3%. We produce separate weight sets for JS and non-JS rendering modes, because the signals AI crawlers see differ depending on whether they execute JavaScript.

Your Score Is Not a Citation Guarantee

This is important to understand before anything else: an Indexably score is not the same thing as asking ChatGPT about your business and seeing if you show up. Indexably measures whether your pages have the structural characteristics that AI-cited pages tend to have. It's a readiness assessment — are your pages built in a way that gives them the best chance of being cited when an AI does find them? Actually showing up in an AI response is a different question entirely. It depends on whether the AI searches the web for that query at all, whether its retrieval system finds your page among millions of candidates, whether your content matches the specific question someone asked, your domain's authority and reputation, which AI model the person is using, and even timing — Semrush found that 40-60% of cited sources rotate monthly. You could score 95/100 and never get cited if nobody asks a question your page answers. You could score 60/100 and get cited constantly because you're the most credible source on a niche topic. What your score does tell you: among the factors you can control on your pages and site, where do you stand compared to pages that actually get cited? It's the part of the equation you can fix. The rest — authority, brand recognition, whether someone asks the right question — is built over time.

Why asking an AI to "check" your score gives a different answer

If you take your Indexably results to ChatGPT and say "go look at my site and tell me how AI-visible it is," you'll get a different assessment. That's expected — and it doesn't mean either answer is wrong. They're measuring different things. When ChatGPT visits your site in that moment, it's one model forming a subjective opinion based on what it sees, using its general training about what "good" looks like. It has no comparison data. It doesn't know what 18,000 cited pages actually look like structurally. It hasn't compared your page against a control set. It's giving you an informed guess. Indexably is doing something different: comparing your page's measurable signals against statistical benchmarks from pages that actually got cited by AI platforms, and pages that didn't. The weights come from observing the gap between those two groups — not from any single model's opinion about what matters. It's the difference between asking a friend to read your resume and tell you if it looks good, versus analyzing your resume against 18,000 resumes that actually got interviews and measuring exactly which signals yours is missing. The friend's opinion is useful. The data-backed analysis answers a different, more specific question.

What We Score (what you can control)

Page Scans — 10 Metrics: Crawl & Index Signals, RAG Retrieval Suitability, Content Relevance, Engagement Cues, Multimodal Readiness, AI Readability, Citation Suitability, Structured Metadata, Page Freshness, Domain Expertise. Weighted by the Indexably Method. Your overall page score is a weighted average. Site Scans — 6 Categories: Site Structure, Topic Coverage, Internal Linking, Content Distribution, AI Discoverability, Technical Readability. Weighted equally. Your overall site score is a simple average.

What we measure but don't score

Domain Rank (site scans only): Your domain's rank, referring domain count, backlink diversity, and organic search visibility — pulled from DataForSEO. This data tells you where your domain stands relative to the domains AI actually cites. It shapes your recommendations but doesn't affect your site score, because domain authority takes years to build and isn't something you fix with a code change.

What we can't measure

Cross-web authority signals (page scans only): For single page scans, we don't collect domain-level authority data like backlinks, referring domains, or organic search visibility. For site scans, we pull this data from DataForSEO and show it in your Domain Signals card — see "What we measure but don't score" above. Content relevance to specific queries: Whether your content actually answers the question someone asks an AI. This is probably the single largest factor in whether you get cited, and no tool can measure it from the outside because it depends on what people ask.

Two Types of Analysis

Single Page Scan: Analyzes one URL across 10 metrics that determine citation likelihood. Weights are calibrated from observed AI citation behavior. Recommendations reference specific benchmarks from our citation data. Best for: Optimizing key landing pages, blog posts, and product pages for AI citation. Full-Site Scan: Analyzes multiple pages and evaluates site-wide AI visibility using 6 categories, plus pulls domain authority data from DataForSEO to frame your recommendations. Best for: Understanding overall site discoverability, your domain's position relative to AI-cited domains, and identifying structural improvements worth prioritizing.

AI Platforms We Analyze For

The Indexably Method includes citation data from all five platforms. Each platform weights page signals differently: ChatGPT (OpenAI) — Prioritizes freshness & metadata. Claude (Anthropic) — Spreads weight most evenly. Perplexity — Resembles cross-model average. Google Gemini — Values crawlability most. Google AI Overviews — Most balanced overall. We don't connect to these platforms during a scan. We evaluate your content based on what our citation data shows their retrieval systems tend to reward. Our production scoring uses volume-weighted cross-model analysis to balance across all of them.

Open Research

We publish the Indexably Method — our weights, findings, and documentation — openly. If it's useful to your work — whether you're building tools, advising clients, or running your own studies — we'd appreciate a link back. It helps us keep doing this and sharing it.