AI Brand Flattening Is Architectural, Not a Prompting Problem

Every marketing leader who has watched their AI-drafted copy come back sounding exactly like a competitor’s has reached for the same explanation: we need better prompts. Tighter brand guidelines. A voice document the model can study. More examples in the system message. The assumption underneath all of it is that distinctiveness is a tuning problem, and that with enough instruction the model will eventually sound like you instead of like everyone.

That assumption is wrong, and a NeurIPS 2025 Best Paper now has the numbers to prove it.

The homogenization of AI-generated brand content is not a workflow failure. It is structural. A large language model is a compression of the statistical center of all language ever written, and when you ask it to produce copy, its default trajectory bends toward that center. Prompts and guidelines nudge the output at the margins. They do not move the substrate. The substrate is the mean.

The data is worse than the anecdotes

For a long time the flattening was an impression. Marketers felt their content was converging without being able to measure it. That ended when Liwei Jiang and colleagues ran 26,000 queries across more than 70 large language models and measured how similar the outputs actually were. The headline numbers are stark: GPT-4o and DeepSeek-V3 produced identical phrasing 81% of the time. DeepSeek and Qwen hit 82%. These are different companies, different training pipelines, different countries of origin, converging on the same words four times out of five.

This is not models copying each other. It is models independently discovering the same lowest-energy path through language, because they were all trained to predict the most probable next token over roughly the same corpus of human text. The probable is, by definition, the average. When two models agree 81% of the time on phrasing, they are not agreeing with each other. They are both agreeing with the mean.

The second-order effect is already measurable. By one estimate, 74.2% of new web pages now contain detectable AI content. The corpus that future models train on is increasingly the output of past models, which means the mean is contracting toward itself. Researchers call this model collapse: the distribution narrows, the tails thin out, and the center gets denser with each generation. The flattening compounds.

Why the temperature knob does not save you

The obvious engineering response is to turn up the randomness. Most APIs expose a temperature parameter that controls how far the model is willing to stray from the most probable token. Crank it up and surely you get variety.

A PNAS Nexus study by Wenger and Kenett (March 2026) tested exactly this and found there is no usable setting. At low temperature the output is repetitive and on-mean. At high temperature it becomes incoherent. There is no band in between where the model produces text that is both distinctive and sensible. You move directly from generic to broken with nothing usable in the corridor between them. The knob that was supposed to buy you originality buys you noise.

This is the architectural part of the argument, and it is the part that defeats every prompt-engineering workaround. Distinctiveness and probability are in tension by construction. A genuinely surprising sentence is, statistically, an improbable one, and an improbable one is exactly what the model is trained to avoid. You cannot prompt your way out of the objective function.

The trust cost is already being charged

If this were only an aesthetic problem, marketers could live with it. It is not. Klaviyo’s 2026 study of 8,000 consumers across eight countries found that detected AI brand content makes a consumer four times more likely to trust the brand less. Only 13% of consumers said they completely trust AI. When AI involvement was detected, 31% reported decreased trust in the brand against just 7% who reported an increase.

Read those numbers together with the flattening data and the strategic picture sharpens. The market is filling with content that sounds the same, and the audience is penalizing whoever they catch producing it. Sounding like everyone is not a neutral cost. It is a trust liability that consumers are actively pricing in.

Distinctiveness is inherited, not generated

Here is the consequence that most marketing teams have not absorbed. If the substrate is the mean, then distinctive voice can only come from something the model did not supply. It has to be imported into the prompt, not coaxed out of the model. A brand that had a genuinely different point of view before AI touched it can encode that difference and use the model as an amplifier. A brand whose distinctiveness lived only in the polish of its copywriting has nothing to import, because the polish was exactly the layer the model has now commoditized.

This is why we have argued that inimitable content is a product moat, and why the measured Google penalties on AI content are not an anomaly but a leading indicator. The brands that retain voice in 2026 are not the ones with the best prompts. They are the ones that had proprietary data, a real argument, a lived experience, or a contrarian position to begin with. Voice now accrues to substance. The model can amplify substance, and it can polish nothing into the mean, but it cannot manufacture a difference that did not exist before you opened the prompt window.

The governance problem, then, sits upstream of the prompt. The question is not “how do we instruct the model to sound like us.” The question is “what do we have to say that is ours, and is it encoded somewhere the model can reach.” That is an editorial and strategic question, not a tooling one. Treating it as a tooling problem is how teams spend a quarter refining system prompts and end up with copy that reads like the competitor down the street.

Do this now

Run a flattening audit this week, and treat it as a strategy review rather than a prompt review.

Pull ten pieces of your AI-assisted copy from the last month. Paste each one into a competitor’s brand voice and ask a colleague to guess which company wrote it. If they cannot tell, you have measured your own convergence on the mean.

Then ask the harder question for each piece: what in here could only have come from us? A proprietary number, a customer story, a position no competitor would take, a piece of operational truth. If the answer is “nothing, it is just well-written,” the model has already commoditized it, and no prompt will rescue it.

Finally, move the work upstream. Stop investing in voice documents the model studies, and start investing in the substance the model amplifies. The teams that win the next two years of AI-assisted marketing are not the ones with the cleverest prompts. They are the ones who had something different to say before the prompt existed.

This analysis synthesizes The Great Flattening, Part 2 (State of Brand, May 2026).

Victorino Group helps marketing teams govern AI output for distinctiveness and trust, not just speed. Let’s talk.