- Home
- The Thinking Wire
- Search Didn't Die. Your Ability to Measure It Did.
The prediction was confident and specific: ChatGPT would gut Google Search. It did not. Google reported Search queries at an all-time high, AI Mode crossed one billion monthly users, AI Overviews reached 2.5 billion, and Search revenue grew 19% year over year (Sundar Pichai, via Sherwood News, June 2026). The headline disruption arrived as the opposite of the forecast.
The real disruption is quieter and harder to spot on an earnings call. The instrument you used to measure search just stopped giving the same reading twice.
The Number That Should End Most AI-Visibility Dashboards
Growth Memo ran the largest reproducibility test on AI search citations to date: 82,619 prompts, 815,000 prompt-page pairs, 17 weeks of data. The finding that matters for anyone reporting “AI visibility” to a board is this. Across three identical ChatGPT runs of the same prompt, only 2.2 to 2.3% of cited sources persisted. Within a single model, identical prompts produced 10 to 34% variance in their citations.
Run the same query three times. Get three substantially different answers about who gets cited. That is not a tracking error to be cleaned up later. It is the native behavior of the surface.
The churn extends week to week. Google AI Mode replaces 56% of its cited sources every week. ChatGPT replaces 74%. The sources an LLM trusts this Tuesday are mostly gone by next Tuesday, and they were never stable inside a single day to begin with.
Why This Breaks the Old Reporting Model
Classic SEO measurement assumed a deterministic surface. You typed a keyword, Google returned a ranked list, and that list held still long enough to measure. Position 4 today meant position 4 tomorrow, give or take. A dashboard could report a number and the number meant something because the underlying surface was stable.
AI search has no such floor. The model samples from a probability distribution at generation time, so each run is a draw, not a lookup. Reporting “we appear in 30% of AI answers” from a single run is like reporting a national poll from one phone call. The figure is real in the narrow sense that you observed it. It is also statistically meaningless as a description of the population.
This is where most AEO dashboards quietly fail. They inherited the SEO habit of one query, one reading, one number, and applied it to a surface that violates the assumption the habit was built on. The number on the slide is a single coin flip presented as a batting average.
The Second Shift: The Keyword Itself Is Being Retired
Even if you fixed the sampling problem, Google is removing the unit you used to measure with.
For AI Max campaigns, Google introduced the AI Brief, a Gemini-powered prompt layer that replaces keyword lists with messaging, matching, and audience guidelines (Search Engine Land, June 2026). Advertisers stop handing the system a list of exact strings to bid on. They describe intent in natural language, and Gemini interprets it. The supporting data tells you why. Exact-match has lost roughly 10 percentage points of spend share since 2022, and AI Mode queries run about three times longer than traditional search queries (Search Engine Land, analyzing 30,000 Google Ads accounts, February 2026).
The keyword was never just an ad-buying mechanism. It was the atomic unit of search measurement. Volume, rank, share of voice, and competitive overlap were all denominated in keywords. Retire the keyword and every metric built on top of it loses its base unit. You cannot report “share of voice for term X” when the system no longer thinks in terms of X.
So two things broke at once. The surface became non-deterministic, and the measuring stick got pulled. The governance job that depended on both is now standing on neither.
What Replaces Single-Run Tracking: Polling Discipline
The fix is not a better scraper. It is a different statistical posture. Stop treating AI visibility as a fact to be read and start treating it as a population to be estimated. That means importing the discipline of survey research.
Three practices move you from anecdote to signal:
Repeated sampling. A single run is one respondent. Run each prompt many times, on a schedule, and report the distribution, not the last value. The 2% reproducibility number is not a reason to give up. It is the precise reason you must sample repeatedly, the same way a pollster calls thousands of people because no single call represents the country.
Confidence intervals over point estimates. Replace “we appear in 30% of answers” with “we appear in 25 to 35% of answers across N runs, 95% confidence.” A board that sees a range understands the surface is probabilistic. A board that sees a single percentage will anchor on a number that was never stable. The interval is the honest unit.
Persona journeys instead of keyword lists. AI Mode queries are three times longer and conversational. The right input is no longer a flat keyword set but a set of represented buyers asking real, multi-turn questions. Model the personas, run their actual journeys, and measure where you surface across that conversation. This also happens to be the only input shape that survives the AI Brief transition, because it speaks the language Gemini now interprets.
The Governance Reframe
Leadership will keep asking the same question: are we visible in AI search? The defensible answer changed shape. It used to be a position and a trend line. Now it is an estimate with an error bar, refreshed continuously, segmented by persona and by platform.
Anyone still presenting a single AI-visibility percentage from a single tracking run is reporting noise with a decimal point. The decimal makes it look rigorous. The reproducibility data says it is one draw from a distribution they did not characterize. The credible operator is the one who can show the board the interval, the sample size, and the refresh cadence, and explain why the number moves.
Do This Now
- Audit your AI-visibility reports for run count. If any number on a slide comes from a single prompt run, label it provisional today. One run is one respondent.
- Switch to repeated sampling with intervals. Run each tracked prompt on a schedule, at least daily given 56 to 74% weekly source churn, and report ranges with sample sizes, not point estimates.
- Convert keyword lists to persona journeys. Replace flat term lists with represented buyers running real multi-turn queries. This survives the AI Brief transition and matches how AI Mode is actually used.
- Report per platform, never blended. Google AI Mode and ChatGPT churn at different rates and cite different sources. Keep their intervals separate.
- Brief leadership on the new unit. The deliverable is an estimate with an error bar, refreshed continuously. Set that expectation before someone screenshots a single-run number and treats it as truth.
Search did not collapse. The market got that prediction exactly backward. What collapsed is the quiet assumption that a search result is a stable thing you can read once and report. Measure it like a poll, or stop claiming you measure it at all.
This analysis synthesizes How to Make Prompt Tracking Much More Accurate (Growth Memo, June 2026), Google AI Brief May Be the Replacement Keywords Never Had (Search Engine Land, June 2026), and ChatGPT Failed to Kill Google Search (Sherwood News, June 2026).
Victorino Group helps teams build AI-search measurement they can defend to a board. Let’s talk.
All articles on The Thinking Wire are written with the assistance of Anthropic's Opus LLM. Each piece goes through multi-agent research to verify facts and surface contradictions, followed by human review and approval before publication. If you find any inaccurate information or wish to contact our editorial team, please reach out at editorial@victorinollc.com . About The Thinking Wire →
If this resonates, let's talk
We help companies implement AI without losing control.
Schedule a Conversation