Back to blog
aeoai-citationsoriginal-researchstate-of-ai-citations
FogTrail Team·

Claude produced exactly 6 brand citations in all three waves of our study, making it the most deterministic AI search engine we measured. With a volatility score of just 8%, Claude's citation behavior barely shifted week to week, while ChatGPT swung by 39% across the same period. If your brand is invisible on Claude today, the data says it will almost certainly be invisible on Claude next week.

This matters because determinism cuts both ways. Claude is the easiest engine to diagnose (its behavior is consistent enough to trust a single measurement) but potentially the hardest to change. Its citation patterns appear locked in, which means brands that have already earned Claude's attention are sitting on a stable asset, and brands that haven't are staring at a wall that does not move.

The Data: Five Engines, Three Waves, One Outlier

We ran 20 identical B2B SaaS queries across ChatGPT, Perplexity, Gemini, Grok, and Claude once per week for three consecutive weeks in March 2026. We tracked 25 brands across 5 categories: CRM, project management, email marketing, analytics, and dev tools.

Every engine's citation count fluctuated. Except Claude.

EngineW1 CitationsW2 CitationsW3 CitationsChange RangeVolatility Score
Claude66608%
Gemini768218%
Perplexity754321%
Grok277527%
ChatGPT2312141139%

ChatGPT's citation count dropped by nearly half between Wave 1 and Wave 2 (23 to 12), then partially recovered to 14. Grok tripled from 2 to 7 and held. Perplexity drifted downward. Claude sat at 6, unchanged.

The volatility score measures the percentage of citations that changed between consecutive waves. Claude's 8% means that across all 20 queries, its citation decisions barely moved. ChatGPT's 39% means roughly two out of every five citations were different from one week to the next.

Claude's Salesforce Bias: What Determinism Looks Like in Practice

Numbers tell part of the story. The more revealing evidence is what Claude actually recommends, and how stubbornly it sticks to those recommendations.

Consider this: when we asked all five engines "best CRM for startups," four of them agreed on HubSpot at position #1. Claude was the sole dissenter. It recommended Salesforce first.

That happened in Wave 1. It happened again in Wave 2. And again in Wave 3. Three consecutive weeks, identical behavior, while the other four engines all pointed to HubSpot.

EngineQ0: Best CRM for Startups (W1)W2W3
PerplexityHubSpotHubSpotHubSpot
ChatGPTHubSpotHubSpotHubSpot
GeminiHubSpotHubSpotHubSpot
GrokHubSpotHubSpotSalesforce
ClaudeSalesforceSalesforceSalesforce

Claude dissented against a 4-engine consensus for three straight weeks. This is not a random fluctuation. This is a stable behavioral characteristic baked into how Claude ranks brands.

The recommendation itself is counterintuitive. Salesforce is an enterprise CRM with enterprise pricing. Recommending it first for a "best CRM for startups" query suggests Claude weights brand authority and market position differently than other engines, potentially favoring established names even when query context points elsewhere.

Across all four CRM queries, Claude placed Salesforce at #1 in three of them in every single wave. It was the most Salesforce-aligned engine in the entire study.

Stability Beyond CRM

Claude's determinism was not limited to CRM queries. Its total URL output held at almost exactly the same level across all three waves: 170 URLs in Wave 1, 160 in Wave 2, 160 in Wave 3. Compare that to Gemini (251, 278, 240) or ChatGPT (124, 126, 125, which was stable in volume but wildly unstable in which brands it cited).

Claude also maintained 22 distinct brand mentions in all three waves. It consistently surfaced the broadest set of brands among all engines, but it linked to only 6 of them with actual citation URLs each time.

This pattern, broad mentions paired with narrow citations, is another signature of Claude's behavior. It talks about many brands. It links to few. And the few it links to do not change.

How ChatGPT's Volatility Creates a Different Problem

ChatGPT represents the opposite end of the spectrum. ActiveCampaign appeared in ChatGPT's email marketing responses in Waves 1 and 2, then vanished entirely in Wave 3. Zero mentions across all four email queries. Meanwhile, ActiveCampaign still appeared on the other four engines, ruling out a universal signal change. This was ChatGPT-specific noise.

The same week, ChatGPT gave Netlify its first ever #1 position for a dev tools query, something Netlify had failed to achieve across 28 prior engine responses. Was it a breakthrough or a blip? Given ChatGPT's 39% volatility score, the honest answer is: it could be either. You would need multiple waves to know.

For brands trying to measure their AI visibility, this creates a practical problem. A single ChatGPT snapshot is close to meaningless. One week you are cited, the next you are not, and the week after you might be back. Claude, by contrast, gives you a reliable baseline. If Claude cites you this week, it will almost certainly cite you next week.

What This Means for Your AEO Strategy

Claude is your diagnostic engine. Because its behavior is so stable, a single Claude measurement gives you a trustworthy read on where you stand. If Claude cites you, your content has durable signals that at least one engine finds credible. If Claude ignores you, no amount of retesting will produce a different result. Something structural needs to change.

ChatGPT requires repeated measurement. Any brand making strategic decisions based on a single ChatGPT query is building on sand. ChatGPT's citation count swung from 23 to 12 to 14 across three identical test runs. You need multi-wave monitoring to separate signal from noise.

Determinism makes Claude the hardest engine to move. The same stability that makes Claude easy to diagnose also makes it resistant to change. Claude's Salesforce bias held for three straight weeks against a 4-engine consensus. Whatever ranking heuristics Claude uses, they appear deeply embedded. Brands trying to break into Claude's citation set should expect a longer timeline than with more volatile engines.

What You Can Do About It

  • Use Claude as your baseline measurement. Run your key queries through Claude first. If you are invisible there, you have a structural content problem, not a timing problem. Fix the fundamentals before worrying about other engines.
  • Do not measure ChatGPT once and call it done. Run the same queries at least three times over two to three weeks. Average the results. A single ChatGPT snapshot has a roughly 40% chance of being different next week.
  • Audit Claude's specific brand preferences in your category. Claude dissented against a 4-engine consensus in CRM. Check whether Claude has a similar locked-in preference in your category. If it does, understand what that preferred brand has (documentation quality, market authority, third-party coverage) that you lack.
  • Prioritize the volatile engines for quick wins. If you need to show citation gains fast, focus on ChatGPT and Grok (27% volatility). Their citation sets shift enough that new content can break in within weeks. Claude will take longer.
  • Track each engine independently. As of March 2026, the five major AI search engines behave like five different recommendation systems. An "average" across all engines hides the reality that Claude and ChatGPT are operating under fundamentally different citation logic.

Methodology

We ran 20 queries across 5 AI search engines: ChatGPT, Perplexity, Gemini, Grok, and Claude. Each query was sent as a real-time API call, simulating how actual users interact with these platforms. We tracked 25 B2B SaaS brands across 5 categories. The three waves were collected on March 6, March 10, and March 15, 2026, producing 300 engine-query data points.

FAQ

Is Claude actually deterministic, or does it just have low variance?

Claude is not perfectly deterministic. Its 8% volatility score means some citations did shift between waves. But compared to ChatGPT (39%) and Grok (27%), Claude's output is remarkably stable. "Most deterministic" is relative to the other engines in our study.

Does Claude's Salesforce bias affect other categories?

Claude's Salesforce preference was specific to CRM queries. In other categories, Claude did not show the same level of dissent against consensus. However, Claude was the most enterprise-favoring engine overall, placing enterprise brands at #1 in 60% of queries in Wave 3, the highest rate of any engine.

Should I optimize for Claude specifically?

Not in isolation. Claude is one of five major AI search engines, and it produces the fewest brand citations of any engine in our study (6 per wave vs ChatGPT's 12 to 23). But Claude's stability makes it a useful diagnostic tool. If you can earn a Claude citation, it is likely to persist.

How often should I check my AI visibility on each engine?

For Claude, once every two to three weeks is likely sufficient given its low volatility. For ChatGPT, weekly checks are the minimum to account for its 39% week-to-week citation churn. For Gemini and Perplexity, biweekly checks strike a reasonable balance.

Will Claude's deterministic behavior persist?

We cannot predict engine updates. Anthropic could change Claude's retrieval behavior at any time. But across three consecutive waves, Claude showed no signs of increasing volatility. Its stability appears to be a structural property of how it handles brand recommendations, not a temporary state.

Related Resources