The five signals AI engines use to decide whether to recommend you
The concrete, checkable things an AI engine looks at when it decides whether to name your business — crawler access, rendered content, structured data, llms.txt, and citable facts. The same checklist my audit runs.
Part 1 argued that the interface is turning from a list into an answer, and that ranking high no longer guarantees you’re named. This part is the practical half: when I crawl a site the way an AI engine does, here’s what I actually check. None of it is mysterious. All of it is fixable.
1. Crawler access — can the engine even read you?¶
The first thing I read is robots.txt, because if you block the AI crawlers, nothing else
matters. The subtle failure here isn’t an obvious Disallow: / — it’s a contradiction.
Plenty of sites (often via a managed CDN setting they forgot about) end up serving a
robots.txt that blocks GPTBot, ClaudeBot, and Google-Extended in one block while
allowing them in another. The behavior is ambiguous and several crawlers honor the block.
So decide deliberately which engines you want to read you, and make robots.txt say exactly
that — once, without contradiction.
2. Server-rendered content — is the substance in the HTML?¶
Engines read HTML far more reliably than they execute JavaScript. If your services, location, and key claims only appear after a client-side framework hydrates, an engine may see an empty shell. Open your page’s raw source (View Source, not the rendered DOM): if the words that describe your business aren’t there as plain text, the engine probably can’t use them.
3. Structured data — can it map you to an entity?¶
JSON-LD is how you hand the engine a clean, machine-readable identity instead of making it
infer one from prose. An Organization/ProfessionalService block with your name, URL,
email, and area served ties your brand, contact, and services together unambiguously. A
FAQPage block exposes your answers as structured Q&A.
One honest caveat I tell every client: Google deprecated FAQ rich results for most sites back in 2023, so don’t add FAQ schema expecting snippets in the blue links. You add it because it gives answer engines clean, attributable Q&A to pull from — a different and growing payoff.
4. llms.txt — a map written for the models¶
llms.txt is an emerging convention: a plain-text file at your root that summarizes who you
are and points to your most important pages. Think of it as a robots.txt for meaning rather
than permission. It’s cheap to add, and it signals you understand the AI-native web — which,
if you’re hiring someone for GEO, is exactly the signal you want your own site to send.
5. Citable facts — would you trust this source?¶
Finally I read the page the way the model “decides” to cite: are the facts clear, consistent, and specific? What you do, who you serve, where, how to reach you. Vague, hedged, or contradictory information (different phone numbers across pages, no clear service area) makes a model less likely to commit your name to an answer. Clarity is a ranking signal for trust.
The point¶
These five aren’t growth hacks; they’re hygiene for the answer interface. The reason I built an audit engine around them is that you can’t see the gap from inside your own site — everything looks fine in a browser, while the engine quietly skips you. If you want the specific list for your site, that’s literally what the free check produces.