Capability Is Worth $26B. Who Measures the Agent's ROI?

TV
Thiago Victorino
6 min read
Capability Is Worth $26B. Who Measures the Agent's ROI?

On May 27, 2026, Cognition announced a Series D of more than $1 billion at a $26 billion valuation. Lux Capital, General Catalyst, and 8VC led the round. The company reports a $492 million run-rate and enterprise usage up more than 10x since the start of 2026. The product is Devin, an autonomous software engineering agent, and the customer list reads like a tier-one roster: Citi, Mercedes-Benz, Goldman Sachs, Santander, Itau, Dell, Elevance, Infosys, Cognizant, plus the US Army and US Navy.

The capability is now capitalized. A market that priced doubt eighteen months ago just priced conviction at $26 billion. The interesting question is not whether Devin can write code. The market has answered that. The question is whether the buyers can prove what they bought.

The Number That Is Missing From the Announcement

Read Cognition’s announcement and you find impressive figures. Mercedes-Benz reportedly compressed an eight-month legacy modernization project into eight days. Itau reports Devin auto-fixing 70% of security vulnerabilities. Internally, Cognition says 89% of code committed by its own engineers is committed by Devin.

Every one of those numbers is a vendor claim. That is not an accusation; it is a category. The Mercedes figure and the Itau figure are self-reported, unverified by any third party. The 89% internal figure is Cognition measuring Cognition, which tells you the tool works for the people who built it, in the environment they built it for. None of it tells a buyer at Citi or Santander the one thing that matters at procurement: how does Devin perform against my own engineers, on my own work, on one shared metric?

That number is missing. Not because Cognition hid it. Because almost nobody is set up to produce it.

Capability Is Not ROI

There is a quiet substitution happening in every enterprise AI deal right now. The vendor demonstrates capability. The buyer infers ROI. Those are not the same thing, and the distance between them is where budgets go to die.

Capability is “Devin closed the ticket.” ROI is “Devin closed the ticket faster, cleaner, and with less rework than the path we were already on, and here is the throughput, defect-rate, and rework data that proves it across a quarter.” The first is a demo. The second is evidence. A $26 billion valuation rests on the assumption that capability converts to ROI inside customer environments at scale. That assumption is currently an act of trust.

Trust is fine as a starting position. It is a dangerous permanent position. The eight-months-to-eight-days story is the kind of figure that wins a board meeting and loses a renewal, because the renewal happens after the buyer has lived with the rework, the context-loading cost, the cases where the agent confidently shipped something wrong. If the only measurement is the vendor’s, the buyer is flying on the vendor’s instruments.

The Scoreboard Test

We have argued before that the right unit of measurement in the agent era is the team, not the model, and that capability is becoming commodity while orchestration becomes the moat. The Cognition raise is the sharpest illustration yet of why that framing matters at the budget level.

Here is the test any enterprise signing a Devin contract should be able to pass. Put your human engineers and your agent on one scoreboard. Same rows: throughput, defect rate, rework rate, cycle time, escaped defects. Same period. Same definition of “done.” Then look at the columns side by side.

If you cannot build that scoreboard, you are not measuring the agent. You are measuring your faith in the announcement. And the moment you cannot show the agent’s output next to your team’s output on shared rows, you have lost the ability to answer the only question your CFO will ask in twelve months: did the $26-billion-valued tool actually beat the baseline we already had?

A second-order point hides here. Most enterprises cannot pass this test for their human teams either. They do not measure throughput, defect rate, and rework on one consistent surface. So when the agent arrives, there is no baseline to compare it against. The agent does not create the measurement deficit. It exposes one that was already there, and makes it expensive.

Why This Prices the Next 24 Months

Capability has been capitalized. That stage is over. The next stage is verification, and it will separate the buyers who measured from the buyers who trusted.

The buyers who built the shared scoreboard before they deployed Devin will know, by quarter’s end, exactly what they bought: a multiplier with a number attached, or a cost with a story attached. They will renew on evidence or cut on evidence. Either way they control the decision.

The buyers who deployed on the strength of the announcement will spend the next 24 months arguing internally about whether it worked, because nobody instrumented the before and the after. That argument is not won with data. It is won by whoever has the most political capital, which is the worst way to make a capital allocation decision.

Governance gates exist to keep enterprises from betting on the unverified. Measurement is the gate that matters most here, because it is the one that converts a $26 billion bet into a $26 billion fact, or catches it before the renewal does.

Do This Now

Before you sign or expand any autonomous-agent contract, build the one-scoreboard test. Pick five metrics: throughput, defect rate, rework rate, cycle time, escaped defects. Define “done” once, for both humans and agents. Capture a baseline from your human team for the prior quarter, even a rough one. Then run the agent on the same rows for one quarter and put the columns side by side.

If you cannot capture the human baseline, that is your first finding, and it is more urgent than the agent decision. You are about to buy autonomous labor you have no way to grade, into a team you have no way to grade. Fix the scoreboard first. The capability is real and it is expensive. Make it prove itself on the same surface as the team you already pay for.

The market has priced the capability at $26 billion. Price the ROI yourself.


This analysis synthesizes Cognition Series D announcement (Cognition, May 2026).

Victorino Group helps enterprises measure humans and AI agents on one shared scoreboard before they bet the budget. Let’s talk.

All articles on The Thinking Wire are written with the assistance of Anthropic's Opus LLM. Each piece goes through multi-agent research to verify facts and surface contradictions, followed by human review and approval before publication. If you find any inaccurate information or wish to contact our editorial team, please reach out at editorial@victorinollc.com . About The Thinking Wire →

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation