The AI Control Problem

Delegation Is Not Decomposition: What Google DeepMind Gets Right About AI Agents

TV
Thiago Victorino
10 min read

A paper from Google DeepMind landed this month that deserves careful attention --- not for its solutions, which are largely speculative, but for its problem diagnosis, which is the clearest articulation yet of why multi-agent AI systems keep failing in production.

The paper is called “Intelligent AI Delegation” by Nenad Tomasev, Matija Franklin, and Simon Osindero. Its central argument is that the AI industry has been treating delegation as a technical routing problem when it is actually a governance problem. The distinction matters more than it appears to.

The Core Distinction

Most multi-agent frameworks --- LangGraph, CrewAI, AutoGen --- treat delegation as decomposition. You have a complex task. You break it into subtasks. You route each subtask to a capable agent. You collect the results. Done.

The DeepMind authors argue this misses three elements that real delegation requires: authority transfer, accountability chains, and trust mechanisms. Breaking a task into pieces is decomposition. Delegation means one entity grants another the authority to act on its behalf, with clear accountability for the outcome and justified confidence that the delegate can deliver.

This is not a pedantic distinction. It is the difference between a software architecture and a governance architecture. And the absence of governance is precisely what breaks multi-agent systems in production.

The Concepts Worth Keeping

The paper introduces several ideas with real explanatory power. Not all of them are equally actionable, but the vocabulary alone advances the conversation.

The Complexity Floor

Below a certain task complexity, delegation overhead exceeds the value of the task itself. Setting up authority transfer, accountability chains, verification protocols, and trust mechanisms costs something. If the task is simple enough, that cost exceeds what you save by delegating.

This concept explains a pattern we see constantly in practice: multi-agent proofs of concept that work on complex demo scenarios but fail to deliver ROI on real workloads. The workloads were below the complexity floor. A single well-prompted agent would have been cheaper and more reliable.

Forrester predicts 75% of companies building their own agentic systems will fail. The complexity floor is one reason why. Many organizations are building delegation infrastructure for tasks that do not warrant delegation.

Cognitive Monoculture

When every agent in a multi-agent system runs on the same foundation model, failures become correlated. A hallucination pattern in GPT-4 will propagate through every agent in a GPT-4-based delegation chain. A reasoning weakness in Claude will affect every Claude-based node. The system has no diversity of failure modes.

The paper calls this “cognitive monoculture,” drawing on Kleinberg and Raghavan’s work on algorithmic monoculture in decision-making (PNAS, 2021). The biological analogy is precise: genetic monocultures in agriculture produce higher yields under normal conditions and catastrophic loss under stress. The same dynamic applies to model monocultures in agent systems.

This is a genuinely non-obvious insight. Most organizations building multi-agent systems optimize for consistency --- same model, same provider, same inference pipeline. The DeepMind paper argues this optimization creates correlated risk that no amount of individual agent testing will reveal.

Moral Crumple Zones

The paper references Madeleine Clare Elish’s 2019 concept of “moral crumple zones” --- structural positions where humans absorb blame for system failures they could not meaningfully prevent. In the context of AI delegation, this describes the “human-in-the-loop” checkpoint that exists to satisfy compliance requirements without granting actual control.

A human who reviews 200 AI decisions per hour is not exercising oversight. They are absorbing liability. The approval rate will converge toward 100% because the cognitive load of meaningful review at that volume is unsustainable. The human becomes a rubber stamp with legal exposure.

This directly challenges the default approach to AI governance: insert a human approval step and declare the system supervised. The DeepMind paper argues --- correctly, in our experience --- that supervision without the capacity to supervise is theater.

KPMG’s Q4 AI Pulse Survey found 75% of leaders prioritize security and compliance for AI agents. MIT Sloan and BCG research found 69% of executives agree agentic AI requires fundamentally new management approaches. The intent is there. The implementation remains superficial.

Liability Firebreaks

The paper proposes “liability firebreaks” --- predefined contractual boundaries where an agent either assumes full, non-transferable liability for its output or halts execution and escalates. No current framework implements this. But the concept is important because it forces an answer to a question most multi-agent architectures avoid: when this agent makes a consequential error, who is responsible?

In production multi-agent systems today, the answer is usually “it depends” or “nobody specifically.” This ambiguity is tolerable in demos. It is not tolerable under the EU AI Act, which takes enforcement effect in August 2026 with penalties up to EUR 35 million or 7% of global turnover.

Contract-First Decomposition

The paper’s most practically useful idea: only decompose a task into subtasks if you can verify each subtask’s output. If you cannot define what a correct result looks like for a subtask, you cannot delegate it safely. Verification comes first. Decomposition follows.

This inverts the standard approach. Most teams start with “what can we parallelize?” The DeepMind paper argues you should start with “what can we verify?” Everything verifiable can be delegated. Everything else stays with the delegator or gets restructured until it becomes verifiable.

Where the Paper Falls Short

The problem diagnosis is strong. The proposed solutions are not.

The paper is entirely theoretical. There are no experiments, no benchmarks, no empirical validation of any proposed mechanism. The technical solutions it suggests --- zero-knowledge proofs for delegation chains, smart contracts for accountability enforcement, game-theoretic consensus for conflict resolution --- are individually interesting and collectively impractical at current scale.

The paper approvingly cites TrueBit, a verification protocol built on blockchain smart contracts. TrueBit collapsed 99.9% in value in January 2026 after a $26.6 million exploit. This does not invalidate the concept of computational verification, but it demonstrates the distance between theoretical elegance and production reliability.

More fundamentally, the heavyweight governance mechanisms the paper proposes may raise the complexity floor so high that most tasks fall below it. If your delegation framework requires zero-knowledge proofs, game-theoretic consensus, and smart contract enforcement for every task handoff, you have solved the governance problem by making delegation impractical for ordinary use cases.

The practical frameworks that exist today --- LangGraph, CrewAI, AutoGen --- work imperfectly but work in production. They solve real problems for real organizations. The DeepMind paper identifies genuine gaps in these frameworks without offering implementable alternatives.

There is also an institutional bias worth noting. A paper from Google DeepMind that advocates for heavyweight governance mechanisms naturally favors organizations with the resources to implement them. The paper itself acknowledges that safety may become a “luxury good” --- verification mechanisms that add cost, making robust delegation available only to well-resourced users. This is an honest admission that doubles as an unintentional argument for the status quo.

What Practitioners Should Take From This

The paper’s value is diagnostic, not prescriptive. Here is what holds up.

The distinction between delegation and decomposition is real. If your multi-agent system has no concept of authority transfer, accountability chains, or trust mechanisms, you have built a task router, not a delegation framework. Task routers work for simple cases. They fail when consequences matter.

The complexity floor explains real failures. Before building a multi-agent system, calculate the governance overhead and compare it to the task value. Many tasks that seem like delegation candidates are better handled by a single well-configured agent with good tooling.

Cognitive monoculture is an underpriced risk. If every agent in your system runs on the same model, you have optimized for consistency at the expense of resilience. Consider model diversity for critical paths --- not because different models are better, but because they fail differently.

Human-in-the-loop is not governance by default. If your human oversight step processes decisions at a rate that precludes meaningful review, it is not oversight. It is liability absorption. Design for review capacity, not just review presence.

Verification-first decomposition is immediately actionable. Before delegating any subtask, define what a correct output looks like. If you cannot define correctness, you cannot verify it. If you cannot verify it, delegation is a gamble.

The market is moving fast. Gartner estimates 40% of enterprise applications will embed AI agents by end of 2026. Deloitte projects $52 billion in AI agent orchestration spend by 2030. AI incidents increased 21% from 2024 to 2025 according to the AI Incidents Database. KPMG found 65% of leaders cite system complexity as the top barrier to agent deployment.

The gap between deployment velocity and governance maturity is widening. The DeepMind paper names this gap precisely, even if it does not close it.

The Governance-First Thesis

The deepest insight in the paper is one sentence buried in the middle: delegation without governance is just distribution of failure.

We have argued this position consistently. Governance is not a layer you add after the system works. It is the architectural foundation that determines whether the system works at all. The vocabulary the DeepMind paper introduces --- liability firebreaks, complexity floor, cognitive monoculture, contract-first decomposition --- gives practitioners better tools to reason about these problems.

The solutions will come from practitioners, not from theoretical papers. They will be incremental, pragmatic, and imperfect. They will look more like better orchestration frameworks with governance primitives built in than like zero-knowledge proof chains for every task handoff.

But the diagnosis matters. You cannot fix a problem you have not named. And this paper names the governance gap in multi-agent AI more precisely than anything else published this year.


Sources

  • Nenad Tomasev, Matija Franklin, Simon Osindero. “Intelligent AI Delegation.” Google DeepMind, February 2026. arXiv:2602.11865v1.
  • Madeleine Clare Elish. “Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction.” Engaging Science, Technology, and Society, 2019.
  • Jon Kleinberg, Manish Raghavan. “Algorithmic Monoculture and Social Welfare.” PNAS, 2021.
  • Gartner. Enterprise AI Agent Adoption Forecast, 2026.
  • Forrester. Agentic AI Predictions, 2026.
  • KPMG. Q4 AI Pulse Survey, 2025.
  • MIT Sloan Management Review / BCG. AI Management Approaches Research, 2025.
  • Deloitte. AI Agent Orchestration Market Forecast, 2026.
  • AI Incidents Database. Annual Incident Report, 2025.

Victorino Group helps organizations build governance infrastructure for multi-agent AI systems --- from delegation architecture through production operations. If your agent systems work in isolation but fail in coordination, the problem is structural.

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation