Methodology & sources

How the audit works

Every audit ends with this section so you can see what we checked, where each recommendation comes from, and how confident we are in it. Share this page when you want to send the framework — not a specific run — to a client or developer.

What the audit checks and why

01

Can AI find your site?

We check robots.txt, sitemap discovery, and whether your CDN/WAF is letting documented AI crawlers reach your pages.

02

Can AI understand your content?

We check headings and page structure, structured data, and whether the main content is reachable without running JavaScript.

03

Will AI cite you as a source?

We check trust signals (HTTPS, author markup, outbound citations, freshness) and content shape that nudge an AI assistant toward picking you when it answers.

Where these recommendations come from

We don't make up rules. Every check is grounded in one of four source types — and findings link back to the exact doc they came from.

Official documentation from AI companies

Crawler names, user agents, and opt-out mechanisms come straight from each provider's own documentation.

Established web standards

Checks like robots.txt parsing, structured data, and sitemap discovery follow the documented protocol — not heuristics.

Emerging standards

Emerging

Worth knowing about, but not yet officially adopted. We surface these as low-effort possible-future wins, not as critical fixes.

Research and industry observation

Content-structure recommendations and trust-signal heuristics lean on published research and reporting from infrastructure providers, not provider docs.

How confident we are in each finding

Every finding carries one of three confidence levels so you can tell at a glance which recommendations are based on documented standards and which are based on observed behaviour or emerging practice.

Definitive
Based on standards or official documentation.
Examples: robots.txt blocks GPTBot; no JSON-LD Article schema present.
Suggestive
Based on observed behaviour, not conclusive.
Example: CDN appears to block GPTBot — our audit IP received 403, but real GPTBot uses published IP ranges that may be treated differently.
Emerging best practice
Based on early signals or recent research, not established standards.
Example: llms.txt is recommended but not yet required by major AI providers.

About this audit run

Reproducible

We show the exact provenance of every report so anyone receiving it can re-run it, verify it, or just trust where the numbers came from.

Audit timestamp
Recorded per audit
Tool version
Recorded per audit
Audit region
Recorded per audit
User agent
Recorded per audit
Pages sampled
Recorded per audit
Duration
Recorded per audit
Run an audit