Operating AI

When AI Builds at the Speed of Thought, Who Decides What Gets Shipped?

TV
Thiago Victorino
10 min read

Zach Wills, Senior Director of Engineering and Applied AI at Luxury Presence, runs roughly 60 autonomous AI agents. Not as a demo. Not as a research project. In production, on a real codebase, shipping real software.

One morning he woke up to 77 pull requests. None of them were requested. The agents had autonomously identified work that needed doing --- bug fixes, feature improvements, code cleanup --- and done it overnight.

This sounds like a productivity fantasy. And for a certain class of problem, it is. His team now averages 15-20 AI-generated pull requests per day. The gap between having an idea and seeing it implemented has collapsed from weeks to hours, sometimes minutes.

But there is a number in Wills’ data that deserves more attention than the 77: the rejection rate is 33%.

One in three things these agents build gets thrown away.

The Rejection Rate Is the Story

Most coverage of autonomous AI development leads with the impressive numbers. Dozens of agents. Overnight pull requests. Speed-of-thought implementation. And these numbers are real. They represent a genuine shift in what is possible.

But a 33% rejection rate on autonomous output is not a footnote. It is the central fact.

If a human engineering team had a 33% rejection rate on code review, you would call an all-hands meeting. You would question the hiring pipeline, the specification process, the architectural standards. You would not celebrate the volume and ignore the waste.

When autonomous agents produce rejected work, they consume compute, engineering review time, and CI/CD resources. They create noise in the codebase. They generate merge conflicts that slow down the work that does pass review. A rejected pull request is not free. It costs real money and real attention.

Wills understands this. His system includes review processes, quality gates, and human oversight. The 33% rejection rate is not a failure --- it is the cost of the governance system doing its job. Without that governance, the rejection rate would not be lower. It would be invisible. Bad code would ship.

The question every organization should be asking is not “how do we get to 77 PRs overnight?” It is “do we have the review infrastructure to tell which 51 are good and which 26 are waste?”

The Bottleneck Migration

Here is the pattern that matters: as AI eliminates each bottleneck, the constraint moves somewhere harder.

First, the bottleneck was writing code. AI agents removed that. Now the bottleneck is reviewing code. So organizations will try to automate review. Then the bottleneck becomes defining what should be built. And that bottleneck --- judgment, strategy, architectural coherence --- does not yield to automation. It yields to organizational capability.

Wills describes this explicitly. His team’s constraint is no longer engineering capacity. It is orchestration: deciding what the agents should work on, reviewing what they produce, and maintaining coherence across a codebase that changes faster than any human can track.

Nolan Lawson, a developer at Microsoft, frames this more bluntly. He observes that programmers are becoming reviewers --- spending their days approving or rejecting AI-generated code rather than creating it. The craft shifts from building to judging. The question is whether organizations are prepared for a world where judgment, not execution, is the scarce resource.

The Dark Factory and Its Price Tag

StrongDM, an infrastructure access company, has gone further than most. Simon Willison documented their approach in February 2026: two governing principles for their AI-assisted development. The first is that code must not be written by humans. The second is that code must not be reviewed by humans.

They call this the Dark Factory. No human hands touch the code at any point. AI writes it. AI reviews it. AI tests it in what they call the Digital Twin Universe --- a complete simulation of their production environment.

The ambition is striking. So is the cost: roughly $1,000 per day per engineer in AI token spend.

This number punctures the assumption that AI development is cheap. It can be fast. It can be powerful. But the token costs, the infrastructure for testing, the simulation environments, the orchestration systems --- these are real expenses. At $1,000 per engineer per day, a 20-person engineering team spends $5 million per year on AI tokens alone. This is not “free execution.” It is a different cost structure with different trade-offs.

And it raises a question that few organizations are asking carefully enough: what happens when the AI reviewer is wrong?

StrongDM’s approach depends on the assumption that their testing environment faithfully represents production. That their AI reviewer catches what a human reviewer would catch. That the Digital Twin Universe does not contain subtle divergences from reality. These are engineering assumptions, and they may be good ones. But they are assumptions, not certainties. And when the cost of being wrong is a security vulnerability in an infrastructure access product, the stakes are not theoretical.

The Recursive Speed Problem

Wills identifies something that deserves its own name: the recursive speed problem.

When execution becomes near-free, every idea becomes worth trying. This generates more ideas, which generate more execution, which generates more ideas. The feedback loop accelerates. Your backlog does not shrink. It grows. It grows faster than before, because the very act of building reveals adjacent possibilities that were previously invisible.

The old question was: “Is this worth building?” The economics demanded a high bar. Engineering time was expensive. Ideas competed for limited capacity.

The new question, as Wills frames it, is: “Is this worth NOT building?” When the cost of implementation approaches zero, the opportunity cost of inaction starts to dominate. The calculus inverts.

This sounds liberating. It is, in fact, a governance crisis.

When everything is cheap to build, everything gets built. Without a system for deciding what matters, organizations drown in their own output. Features ship without strategy. Technical debt accumulates at machine speed. The codebase becomes a sprawling collection of experiments, most of which should never have been started.

Speed without direction is not velocity. It is Brownian motion --- energetic, expensive, and going nowhere.

Stanford Asks the Right Question

In February 2026, Stanford Law School published a paper with a title that captures the issue precisely: “Built by Agents, Tested by Agents, Trusted by Whom?”

The trust question is the governance question. When autonomous agents build and test software, the traditional chain of accountability breaks. A human developer who writes buggy code is accountable. A human reviewer who approves it shares the accountability. But when an agent writes code and another agent reviews it, who is accountable for what ships?

This is not a philosophical abstraction. It is a liability question, a regulatory question, and an insurance question. Companies shipping software into regulated industries --- finance, healthcare, automotive --- need answers. And most do not have them.

The technical capability to build autonomously has outpaced the organizational capability to govern autonomously. This gap is where the real risk lives.

The Competency Gap Is Organizational

Wills lists his 2026 predictions, and one stands out: the competency gap between organizations that can operate autonomous agents and those that cannot will widen dramatically.

This is not a technology gap. The tools are available to everyone. Wills uses Claude Code. StrongDM uses Claude. The models are commercial products. The APIs are public. Any organization can spin up 60 agents tomorrow.

But having agents is not the same as governing agents. The organizations that pull ahead will not be the ones with the most agents. They will be the ones with the best systems for directing, reviewing, and maintaining coherence across autonomous output.

This requires capabilities that most organizations have not built: review processes designed for machine-speed output, quality gates that operate automatically, architectural guardrails that agents cannot violate, feedback loops that improve agent performance over time, and escalation protocols for decisions that require human judgment.

These are organizational capabilities, not technical features. You cannot install them. You have to build them. And building them takes the one thing that AI has not made cheaper: clear thinking about what your organization actually needs.

What This Means For Your Organization

The speed is real. Autonomous AI development produces output at a pace that makes traditional engineering velocity irrelevant. If your planning still assumes weeks-per-feature timelines, you are managing against an obsolete constraint.

The waste is also real. A 33% rejection rate on AI-generated code means one-third of your AI compute, review time, and CI resources produce nothing of value. The governance system that catches bad output is not overhead. It is the mechanism that converts speed into value.

Review is the new bottleneck. Your best engineers should not be writing code. They should be reviewing it, defining architectural standards, and building the systems that make automated review reliable. If your engineering culture still rewards lines of code over judgment calls, you are optimizing for a constraint that no longer binds.

Cost structures are shifting, not disappearing. StrongDM’s $1,000/day per engineer shows that autonomous development is not free. It trades labor costs for compute costs, human review for AI review, slow-and-deliberate for fast-and-filtered. Understand the new cost structure before committing to it.

Governance is the competitive advantage. The organizations that capture the most value from autonomous AI development will not be the ones that move fastest. They will be the ones that move fastest while maintaining coherence, quality, and accountability. Speed without governance is just expensive chaos.

Wills calls this a people problem wearing technology clothes. He is right. The technology works. The question is whether organizations can build the judgment layer that makes the technology worth using.

The agents are ready. The question is whether you are.


Sources

  • Zach Wills. “Building at the Speed of Thought.” February 2, 2026.
  • Simon Willison. “StrongDM’s Dark Factory approach.” simonwillison.net, February 7, 2026.
  • Nolan Lawson. “We Mourn Our Craft.” February 7, 2026.
  • Stanford Law School. “Built by Agents, Tested by Agents, Trusted by Whom?” February 8, 2026.
  • Zach Wills. “2026 AI Bets.” 2026.

Victorino Group helps organizations build the governance layer that turns autonomous AI speed into reliable business outcomes. If your team is scaling AI agents and needs the review infrastructure to make it work, reach out at contact@victorinollc.com or visit www.victorinollc.com.

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation