McKinsey's Agent Factory Prescription Is Missing Its Own Diagnosis

McKinsey’s latest article on AI agents opens with a number that should stop every reader cold. Nearly 8 in 10 organizations report no significant bottom-line gains from AI. The firm buries this in the second paragraph and spends the remaining 3,000 words proposing a solution called “agent factories.”

We have been tracking McKinsey’s evolving AI diagnosis across three previous articles. Measurement problems became design problems, then the word “governable” appeared once without elaboration. This fourth installment completes a pattern: McKinsey keeps arriving at the right number, then prescribing the wrong fix.

The 80% Number Is the Most Honest Thing in the Article

Start with what they got right. The statistic is credible. It aligns with the NBER’s February 2026 survey of roughly 6,000 executives, where 89% reported zero labor productivity impact from AI. It aligns with BCG’s AI Radar 2025, where only 26% of companies generated significant financial returns. It aligns with Gartner’s prediction that 40% of agentic AI projects will be canceled by 2027.

Multiple independent sources, different methodologies, converging on the same conclusion: most AI investments are not producing measurable returns.

This convergence is important because McKinsey’s previous AI research relied on perception data. As we analyzed in McKinsey Measured the Wrong Thing, their 2025 developer productivity study asked 300 executives what they believed was happening. METR’s randomized controlled trial found the opposite. Developers were 19% slower with AI tools while believing they were 24% faster.

The 80% number is different. It measures outcomes. And it says most organizations are failing.

What McKinsey Proposes: Agent Factories

The prescription is an “agent factory.” The concept: a centralized capability that identifies, builds, deploys, and operates AI agents across business functions. McKinsey describes a five-step process. Identify high-value workflows. Prototype agents. Deploy with monitoring. Measure. Scale.

They are not the only ones using the term. Microsoft published an Agent Factory white paper in February 2026. Oracle adopted the phrase in March 2026. The term is converging across the industry.

But convergence on a label does not mean convergence on a new idea. Agent factories are AI Centers of Excellence with updated branding. Deloitte, IBM, Oracle, Microsoft, and KPMG have documented the CoE model for over a decade. Centralize expertise, standardize processes, govern deployment, scale across the organization. The structure is identical. The technology stack is different.

Rebranding is not inherently a problem. If agent factories work, the label does not matter. The problem is what the label obscures.

The Case Studies Are Unverifiable

McKinsey offers three case studies to support the agent factory model. A European insurer with 2-3x conversion improvement. A US airline with 210%, 800%, and 59% improvements across three metrics. A US homebuilder that analyzed 500,000 transcripts and tripled conversion rates.

All three are anonymous. None provide enough detail to verify independently. Cross-referencing the airline numbers against public data suggests they may originate from a telecom case study rather than an airline.

This matters because McKinsey is citing results from its own consulting engagements to recommend more consulting engagements. The commercial circularity is a pattern we documented in our first analysis of their methodology. Every case study is a McKinsey client. The 80% that failed are invisible. Survivorship bias is baked into the evidence base.

The article’s source structure reinforces the problem. Nearly every citation links to another McKinsey publication. No independent research. No academic studies. No controlled trials. The evidence is circular.

The Productivity Projections Contradict Independent Data

McKinsey projects 3-5% annual productivity gains from agents and 10%+ growth lift from agent-enabled go-to-market strategies. These numbers need context.

METR’s RCT found that experienced developers were slower, not faster, with AI assistance. The NBER survey found that executives predict only 1.4-1.9% productivity gains over three years. McKinsey’s projections run two to five times higher than what independent sources expect.

Optimistic projections are not wrong by definition. But when the gap between your projections and every independent measurement runs in the same direction, the burden of evidence shifts. McKinsey has not provided evidence that clears that burden.

The Structural Contradiction

Here is where the article defeats itself.

McKinsey identifies, correctly, that governance failures cause the 80% failure rate. Organizations deploy AI without measurement infrastructure, without workflow boundaries, without accountability mechanisms. The agents operate in a vacuum. The results are predictable.

Then McKinsey prescribes agent factories. The five-step process covers identification, prototyping, deployment, measurement, and scaling. Governance appears as a parenthetical concern. Build faster, build more, scale across the organization. The factory metaphor is telling: factories optimize for throughput.

But the diagnosis says throughput is not the problem. Organizations already have AI. They have agents. They have pilots. What they do not have is the infrastructure to know whether any of it works. More agents deployed faster, without the governance layer to evaluate outcomes, is an acceleration of the existing failure mode.

Gartner’s projection adds weight. If 40% of agentic AI projects face cancellation by 2027 due to governance deficits, unclear ROI, and uncontrolled costs, then an agent factory without governance is a machine for producing projects that get canceled.

The Missing Layer

Agent factories need a governance layer the same way software factories need quality assurance. You would not run a manufacturing line without inspection. You would not deploy code without testing. But McKinsey describes an agent deployment pipeline with no equivalent control system.

What would that layer look like in practice?

Outcome measurement before scaling. The article says “measure.” It does not specify what. Governed measurement means tracking agent decisions against business outcomes, not tracking agent deployment volume. Did the agent’s recommendation improve the metric it targeted? Can you prove causation, not just correlation? Most organizations cannot answer these questions because they never built the measurement infrastructure.

Decision accountability. When an agent makes a recommendation and a human follows it, who is responsible for the outcome? In the homebuilder case study, agents analyzed 500,000 transcripts and recommended follow-ups. If those follow-ups damage customer relationships, the accountability chain is invisible. Governance defines who owns agent-influenced decisions.

Workflow boundaries. Agents need defined scopes. What can they access? What can they modify? What requires human approval? As we explored in Governance Gates Enterprise AI, the bottleneck for enterprise adoption is not capability. It is permissioning. Agent factories that deploy without explicit boundaries create uncontrolled surface area.

Independent verification. The article’s evidence comes entirely from McKinsey’s own work. Governed organizations verify internally. They run A/B tests. They measure control groups. They do not accept vendor case studies as proof of value. The same discipline applies to internal agent deployments.

The Pattern Across Four Articles

This is the fourth McKinsey article we have tracked in this series. The progression is consistent.

Article one measured perception and called it evidence. Article two diagnosed a design problem and stopped short of infrastructure. Article three used the word “governable” once, in passing, without defining it. Article four identifies 80% failure, correctly names the symptom, then prescribes more deployment.

Each article gets closer. Each stops at the same boundary. Governance is acknowledged. Governance infrastructure is never specified. The diagnosis improves. The prescription remains: hire McKinsey.

There is a structural reason for this, and it is not cynicism. Governance infrastructure is not a consulting engagement. It is an ongoing organizational capability. Consulting firms sell projects with defined scopes and end dates. Governance does not end. As we noted in our three diagnoses analysis, the consulting model and the governance model are structurally incompatible. This explains why the diagnosis keeps stopping where it does.

What to Do Instead

If your organization is planning an agent factory, the playbook is straightforward.

Build governance first. Before deploying a single agent at scale, define how you will measure its impact, who owns its decisions, what boundaries constrain its actions, and how you will verify outcomes independently. This is not a phase. It is the foundation the factory sits on.

Treat the 80% number as your baseline. If four out of five AI implementations produce no measurable return, your default assumption for any new agent should be that it will not work until proven otherwise. The burden of proof falls on the agent, not on the skeptic.

Do not accept anonymous case studies as evidence. If a vendor or consultant cannot name the company, describe the methodology, or provide independently verifiable results, the case study is a testimonial. Testimonials are marketing. They are not evidence.

Measure outcomes, not activity. Agent deployment count, agent interaction volume, agent adoption rate. None of these measure value. Measure what changed in the business because of the agent’s work. If you cannot isolate that signal, the measurement infrastructure is missing.

McKinsey will likely specify the governance layer in a future article. The trajectory across four publications points there. Until then, the 80% failure rate is their most useful contribution. It tells you exactly how likely your current approach is to fail if you scale without governance.

This analysis synthesizes McKinsey’s Agents for Growth: Turning AI Promise into Impact (March 2026), the NBER executive survey on AI productivity (February 2026), METR’s randomized controlled trial of AI coding tools (2025), Gartner’s agentic AI project forecast (2025), and BCG’s AI Radar 2025.

Victorino Group builds the governance infrastructure that agent factories need before they can produce results. Let’s talk.