There Is No Universal AI Citation Formula

We have written twice about how AI selects what to cite. First in How AI Decides What to Quote, where we examined positional bias and the governance implications of transformer attention patterns. Then in What 1.2 Million ChatGPT Responses Actually Reveal, where we separated verified signal from commercial noise in the emerging GEO field.

Both analyses drew from the same limitation: they treated AI citation as a single phenomenon. One dataset, one model, one set of patterns applied universally.

New research breaks that assumption open.

The Dataset

Kevin Indig and Amanda Johnson analyzed approximately 98,000 ChatGPT citation rows drawn from roughly 1.2 million responses, sourced through the Gauge platform. What makes this study different from prior work is the segmentation. They broke the data across seven verticals: B2B SaaS, Finance, Healthcare, Education, Crypto, HR Tech, and Product Analytics.

The headline finding is not a new optimization trick. It is a structural discovery: there is no universal formula. The signals that lift citation rates in one vertical actively suppress them in another.

This is a fundamentally different claim than what the GEO field has been making. Every “AI SEO checklist” published in the last year assumes that citation mechanics are consistent across domains. The data says otherwise.

What Varies by Vertical

Start with word count. In CRM and SaaS content, longer pages correlate with a 1.59x citation lift. Longer is better. In Finance, the relationship inverts: shorter pages win, with a 0.86x multiplier for high word counts. Same signal, opposite direction.

Dates in content are a strong positive signal in most verticals. Finance is the exception, where date presence correlates with a 0.65x suppression. One possible explanation: financial content with dates may trigger freshness penalties in retrieval, where outdated financial data is actively harmful and the system has learned to deprioritize it.

Price mentions are the strongest universal negative signal, suppressing citations to 0.5x-0.8x of baseline in most verticals. Finance again breaks the pattern at 1.16x. In a domain where prices are the substance (interest rates, asset valuations, transaction costs), price mentions signal relevance rather than commercialism.

The Knowledge Graph finding deserves particular attention. Conventional GEO advice says to pack content with named entities. The data shows the opposite: high-cited pages have fewer Knowledge Graph entities (0.81x). The explanation is counterintuitive but logical. Pages dense with niche, specific entities (methodology names, precise statistics, named tool comparisons) outperform pages dense with broad, well-known entities. The model does not reward entity volume. It rewards entity specificity.

What Is Universal (Barely)

Two signals held across all seven verticals.

Declarative opening language produced a +14% aggregate lift. Sentences that begin with clear, definitive statements rather than hedging or context-setting. This aligns with the definitional language finding from earlier studies, now confirmed at a finer grain.

Hedging language was a negative signal everywhere. No vertical rewarded qualifications, caveats, or tentative phrasing. The model consistently prefers content that states what it knows rather than content that carefully avoids being wrong.

The Heading Dead Zone

One of the more specific findings: pages with 3-4 headings perform worse than pages with zero headings. Every vertical showed this pattern.

The likely mechanism is structural. Zero headings means the content is a continuous, focused argument. Five or more headings means the content is well-organized reference material. Three to four headings occupies a middle ground: broken up enough to fragment the argument but not structured enough to function as organized reference. The retrieval system gets chunks that are too short to be self-contained and too disconnected to form a coherent answer.

This has direct implications for content teams following generic advice to “add headers for AI.” More headers are not better. Coherent structure is better, and that sometimes means no headers at all.

Corporate Content Dominates, UGC Barely Registers

The source type distribution is stark. Corporate content accounts for 94.7% of all citations. User-generated content accounts for 5.3%.

In Finance and Healthcare, the UGC share drops to 0.5% and 1.8% respectively. This suggests YMYL (Your Money or Your Life) suppression is operating at the retrieval level. The system learned from its training data that authoritative, institutional sources are preferable for high-stakes domains, and it applies that preference aggressively.

Reddit, despite its growing prominence in traditional search results, accounts for only 2-5% of AI citations per vertical. The “Reddit effect” that SEO practitioners have observed in Google has not translated to AI citation behavior. LinkedIn, by contrast, appears in 11% of AI responses and ranks second for AI citations overall.

What This Changes About Governance

The prior articles in this series argued that AI citation patterns are a governance question. This dataset sharpens the argument considerably.

If citation patterns were universal, a single governance playbook could address them. Organizations could audit their content against one set of structural criteria. The problem would be solvable with a checklist.

Vertical-specific patterns make the problem harder and more important. An organization in healthcare faces a fundamentally different citation environment than one in SaaS. The same content strategy that increases visibility in one domain may actively reduce it in another. Generic “AI readiness” audits that apply the same structural recommendations across industries are not just unhelpful. They may be counterproductive.

Consider the practical implication for a financial services firm. Standard GEO advice would tell them to write longer content, include dates for freshness signaling, and avoid price mentions. The data shows the exact opposite is true for their vertical. Following generic advice would move them in the wrong direction.

This extends beyond marketing. When AI systems are summarizing regulatory guidance, surfacing compliance documentation, or assembling research for decision-makers, the vertical-specific biases in what gets cited become a source of systematic error. A healthcare organization whose AI tools disproportionately cite corporate content over clinical research has a governance problem that no amount of prompt engineering solves.

The Implication for Organizations

Three things follow from these findings.

Generic AI visibility strategies are wrong by default. Any recommendation that does not account for vertical-specific patterns is guessing. Organizations need to understand how citation mechanics work in their specific domain before they can govern for them.

Content structure is a domain-specific governance control. The structure of your documentation, the presence or absence of dates, the level of entity specificity, the use of price data, all of these are governance decisions once you understand that AI treats them differently by domain.

The model’s preferences are not neutral. A 94.7% corporate citation rate means AI systems are systematically amplifying institutional voices and suppressing individual ones. In some domains this may be appropriate (you want medical citations from institutions, not forums). In others, it introduces a bias that organizations relying on AI-sourced information should understand and account for.

The question is no longer “how does AI decide what to cite?” We answered that in prior work. The question is now: “how does AI decide what to cite in your domain?” That is the governance question that matters.

This analysis synthesizes The Science of What AI Actually Rewards (March 2026).

Victorino Group helps organizations understand and govern how AI shapes information visibility. Let’s talk.