Hub-Spoke Costs 4x More for Coding, and Loses. The Multi-Agent Question Got Measured.

For two years, multi-agent platforms have been sold with the same picture: a central orchestrator coordinating specialized workers. Hub-spoke. Manager and team. The org-chart analogy was so intuitive that nobody asked whether the architecture actually fits the workload.

Rohit Krishnan asked. He measured. The answer destroys the universal pitch.

In Why Smart Planners Lose to Simple Markets, Krishnan ran the same set of tasks through three architectures: solo (one agent does everything), hub-spoke (orchestrator delegates to workers), and markets (independent agents bid and compete). The results split cleanly along workload type, and the split is the opposite of what most procurement decks assume.

What the Numbers Say

For coding tasks: solo scored 7.2/10. Hub-spoke scored 6.7/10 at $5.33 per task, roughly 4x the cost of solo. The orchestrator architecture lost on quality and cost simultaneously.

For reasoning tasks: markets scored 7.1/10. Solo scored 5.1/10. Independent agents competing on the same problem produced better answers than a single agent thinking longer.

The pattern is structural, not coincidental. Coding requires continuous global state, what was decided three files ago, which interface contract was committed to, what the test fixtures expect. Splitting that across an orchestrator and workers introduces translation loss at every handoff. Krishnan’s verbatim observation about the failure mode: “if either step is wrong, the workers can be individually competent and the final answer still will get worse.” Each agent is fine. The seams between them are where quality dies.

Reasoning is the opposite shape. The task benefits from independent retries, different framings, different chains of thought, different shots at the same target. Markets exploit this. Solo cannot, because one agent can only think one way at a time, and self-doubt is not a substitute for genuine independent attempts.

Why the Org-Chart Analogy Breaks

The deeper question is why hub-spoke fails for coding when it works for human teams. Krishnan is direct: “AI agents are NOT like human agents. Models aren’t just models anymore either.”

Three structural disanalogies do the damage.

No persistent individual knowledge. A human worker accumulates project context across weeks. They remember the customer call where the requirement actually meant something different. They carry tacit knowledge that never made it into the spec. AI agents start every task with no memory beyond what gets stuffed into the prompt. The orchestrator cannot delegate to a worker who has internalized the project, because no such worker exists.

Tacit context cannot be passed in messages. Human teams transfer understanding through conversation, hallway exchanges, shared experience. The orchestrator-to-worker handoff for AI agents is a string of tokens. Everything the worker needs has to be serialized. Anything implicit gets lost. Coding lives or dies on implicit context, naming conventions, error-handling style, what “production-ready” means in this codebase.

Behavior changes with the prompt. A human worker has stable judgment across tasks. An AI agent’s quality is a function of how it was prompted this turn. The orchestrator’s prompt to the worker is itself a source of variance. Two slightly different delegations produce two materially different workers. There is no analog in human management, you do not get a different employee depending on how you phrased the request.

Put together: the org-chart analogy works at the metaphor level and breaks at the architecture level. Hub-spoke for coding is paying 4x for translation loss between agents that have nothing to translate.

Where Each Architecture Actually Fits

The measurements suggest a procurement checklist, not a universal answer.

Solo is right when the work requires continuous global state. Coding, refactoring, multi-file edits, anything where the second decision depends on remembering the first. The cost of a context handoff exceeds the cost of one agent doing it all. Krishnan’s coding numbers are the cleanest version of this argument: the cheaper architecture also won on quality.

Markets are right when the work benefits from independent retries. Reasoning, research, exploration, anything where seeing the same problem from genuinely different angles improves the answer. The 7.1 vs 5.1 gap for reasoning is the gap between three independent attempts and one agent thinking harder. Independent attempts win because they sample a wider distribution.

Hub-spoke is right when the subtasks are genuinely independent and the orchestrator’s role is real coordination. Workflow automation across distinct systems. Pipelines where step N’s output is step N+1’s clean input. Cases where the orchestrator is doing routing and aggregation, not high-bandwidth context transfer. The architecture works when the seams between agents are thin, when there is little to translate, because the boundary was already clean.

The vendor pitch for hub-spoke as the default architecture for coding agents is now measurably wrong. Not philosophically wrong. Wrong by 0.5 quality points and a 4x cost multiplier on the same benchmark.

The Procurement Question

Engineering leaders evaluating multi-agent platforms have been asked to compare features, integrations, model coverage. The wrong question. The right question is whether the platform’s architecture matches the workload type, and whether it lets you switch architectures as workload types change inside the same organization.

A platform that only does hub-spoke is a platform betting that all your workloads look like coordination problems. Krishnan’s data says some of your most expensive workloads do not. Coding, the workload most engineering organizations are trying to accelerate first, is exactly the wrong shape for hub-spoke.

The questions to put to a vendor:

Which workloads in our org are continuous-state (use solo)? Which benefit from independent retries (use markets)? Which are clean coordination (use hub-spoke)?
Does the platform let us run all three architectures, or does it lock us into one?
What is the cost-per-task delta between architectures on a representative workload of ours, measured the way Krishnan measured?
When the architecture is wrong for the workload, is the failure mode visible, degraded quality, higher cost, or is it silent?

The first measured argument that one multi-agent architecture is wrong for some workloads is also the first argument that procurement decisions should be workload-typed, not vendor-typed. Buy the architecture that fits the work. If the platform only sells you one shape, it is selling you the workloads it fits, not the ones you have.

This analysis is grounded in Why Smart Planners Lose to Simple Markets (Why Coase Needs Hayek) (Rohit Krishnan / Strange Loop Canon, May 2026).

Victorino Group helps engineering leaders match multi-agent architecture to workload type before vendor lock-in. Let’s talk.

What the Numbers Say

Why the Org-Chart Analogy Breaks

Where Each Architecture Actually Fits

The Procurement Question

If this resonates, let's talk