- Home
- The Thinking Wire
- You Are Not Killing Code Review. You Are Renaming Governance.
You Are Not Killing Code Review. You Are Renaming Governance.
Ankit Jain, CEO of Aviator, wants to kill code review. His argument, published on Latent.Space, runs like this: AI-generated code is arriving too fast for human reviewers to keep up. Traditional review is a bottleneck. The solution is to move oversight upstream to spec review and replace human judgment with five layers of automated verification.
The diagnosis is correct. We have been making it for months. But the prescription does not follow from Jain’s own evidence, his own cited framework, or even his own proposed system. What he describes as “killing code review” is rebuilding governance with different vocabulary.
The Swiss Cheese Problem
Jain cites James Reason’s Swiss cheese model of accident prevention. This is a good instinct. Reason’s model is one of the most durable ideas in safety engineering. The core insight: no single layer of defense is perfect. Every layer has holes. Safety comes from stacking imperfect layers so the holes never align.
Then Jain argues for removing a layer.
His five-layer replacement (competing agents, deterministic guardrails, BDD acceptance criteria, permission scoping, adversarial verification) is presented as superior to code review. But Reason’s model does not say “replace weak layers with better ones.” It says “add more layers, because you cannot predict where any single layer will fail.” The entire intellectual foundation of the Swiss cheese model argues against reducing the number of verification surfaces.
This is not a minor logical inconsistency. It is the thesis contradicting its own framework.
The Data He Left Out
Jain’s case rests heavily on a Faros.ai study of 10,000+ developers across 1,255 engineering teams. He cites the productivity numbers: +21% task completion with high AI adoption, +98% PRs merged. These sound impressive. They are also incomplete.
The same Faros.ai report found +9% bugs per developer and +154% average PR size with high AI adoption. Jain does not mention either figure.
The report’s headline finding is even more damaging to his argument: Faros found no significant correlation between AI adoption levels and company-level performance improvements. More AI usage did not translate to better outcomes at the organizational level. As we noted in When AI Builds Itself, the +98% PR increase paired with +91% longer review times tells a story about volume outpacing quality infrastructure, not about review being unnecessary.
Cherry-picking the productivity numbers while omitting the bug increase, the PR bloat, and the null result on company-level outcomes is salesmanship dressed as analysis.
What Code Review Actually Does
The argument for killing code review rests on a narrow definition of what review accomplishes. Jain frames it primarily as bug-finding. If automated tools can find bugs better (and they can, in many categories), then human review becomes redundant.
This misses 75% of the picture. A 2024 Springer study of code review practices found that three-quarters of defects identified in review affect evolvability and maintainability, not functionality. Design coherence. Naming conventions. Architectural consistency. Coupling between modules. Technical debt accumulation. These are the concerns that determine whether a codebase remains workable in six months or becomes a system nobody wants to touch.
None of Jain’s five automated layers address evolvability. Competing agents verify functional correctness. Deterministic guardrails enforce rules. BDD tests confirm behavior. Permission scoping limits blast radius. Adversarial verification probes for edge cases. All of these target “does it work right now?” None of them ask “will this codebase still be comprehensible in a year?”
Code review also serves functions that have nothing to do with defect detection: knowledge transfer across the team, mentoring junior engineers, building shared understanding of system architecture, establishing coding norms. These are organizational functions. No automated layer replaces them.
The BDD Bet
Jain’s third layer is BDD (Behavior-Driven Development) acceptance criteria, and he leans on it hard. The idea: define behavior specifications upfront, then verify generated code against those specs automatically.
There are two problems. First, academic research on BDD at scale is thin. A 2021 ScienceDirect review found that “empirical evidence of BDD’s usefulness in large-scale projects is missing.” BDD works well for well-defined, bounded behaviors. Enterprise systems full of complex state interactions, edge cases, and cross-cutting concerns are a different challenge.
Second, and more fundamental: when AI generates both the implementation and the tests, passing specs do not guarantee correct software. Simon Willison raised exactly this concern. If the same AI produces the code and the verification, you have correlated failure modes. A systematic bias in the model will produce code that is wrong in exactly the way the model’s tests expect. An ArXiv paper on spec-driven development makes this explicit: “specs do NOT replace code review” and “passing spec tests don’t guarantee correct software.”
This is the verification recursion problem. Who watches the watchers? Jain’s answer is more automated watchers. But correlated automation does not produce independent verification.
Follow the Product
There is a commercial dimension worth noting. Aviator is a YC-backed startup ($2.42M raised) that builds developer workflow automation tools. Jain’s five-layer framework maps neatly to Aviator’s product line. Layer 3, BDD verification, maps directly to Aviator Verify.
This does not automatically invalidate the argument. Founders with products in a space often see problems clearly because they spend their days on them. But it does mean the incentive structure favors a conclusion where code review (which humans can do for free) is replaced by automated verification layers (which someone needs to sell you).
Even the Latent.Space editor appended a note to the piece: “I am not there yet.” Worth listening to.
The Cost Nobody Mentions
The proposed five-layer system is not cheap to run. Simon Willison estimates that heavy AI-assisted development already costs roughly $1,000 per day per engineer in API fees. Adding competing agents (Layer 1), adversarial verification agents (Layer 5), and continuous BDD testing against AI-generated specs multiplies that cost.
For a 50-person engineering team, the verification infrastructure alone could exceed the cost of the human reviewers it replaces. And it introduces a new dependency: your ability to ship software now depends on the uptime, pricing stability, and continued accuracy of multiple AI services.
This is a trade-off worth discussing openly. Jain does not discuss it at all.
What He Actually Built
Here is the thing that makes this argument frustrating: Jain’s five-layer system is genuinely interesting. Competing agents that verify each other’s work. Deterministic guardrails that catch known categories of error. Permission scoping that limits what generated code can touch. Adversarial verification that probes for failure modes.
This is governance infrastructure. It is oversight architecture. It is exactly the kind of systematic verification that organizations need as AI-generated code volume increases.
But Jain frames it as the death of code review rather than the evolution of code review. The framing is wrong, and the wrong framing leads to wrong conclusions. If you think you killed review, you stop investing in the human judgment layer. You lose the knowledge transfer. You lose the architectural coherence checks. You lose the mentoring. You lose the 75% of review value that automated tools do not capture.
As we explored in The Benchmark Paradox, AI code review tools are useful precisely when they complement human judgment, not when they replace it. The BugBot analysis reached the same conclusion from a different direction: layered approaches work, but only when human review remains in the stack.
What Actually Works
The volume problem is real. AI generates code faster than humans can review it. That is not going away. The answer is not “kill review” or “review everything the old way.” The answer is governed review: risk-based triage that directs human attention where it matters most.
Automated pre-screening. Let AI tools handle the categories they are good at: style consistency, known vulnerability patterns, test coverage, type safety. This is Jain’s Layer 2, and it works. No argument here.
Risk-based routing. Not all code changes carry equal risk. A CSS tweak and a payment processing change should not go through the same review process. Route changes by risk profile: security-sensitive code, data-handling logic, and architectural changes get thorough human review. Low-risk changes get automated verification with spot-check sampling.
Architecture review, not line review. Human reviewers should spend less time on individual lines and more time on design decisions, system boundaries, and integration patterns. This is where human judgment remains irreplaceable, and it is where the 75% evolvability value lives.
Verification independence. When AI generates code, the verification layer must be genuinely independent. That means human reviewers, separate AI models, or formally specified properties. Not the same model checking its own work.
This approach does not “kill” anything. It allocates human attention efficiently within a governed framework. The review process changes shape. It does not disappear.
The Name Matters
Jain might argue this is semantic. “Call it whatever you want. The old process is dead. The new process is better.” But naming matters. When you tell engineering organizations that code review is dead, some of them will hear “we do not need oversight.” They will cut review headcount. They will skip architectural review because the AI passed its own tests. They will ship faster, and the damage will be invisible until the codebase becomes unmaintainable and the institutional knowledge has evaporated.
The Faros.ai data already shows this pattern forming: more code, more PRs, more bugs, longer reviews, no improvement at the company level. That is what “killing review” looks like in practice.
You are not killing code review. You are renaming governance. And the name you choose determines whether organizations invest in the oversight they need or dismantle it in the name of velocity.
This analysis responds to Ankit Jain’s “How to Kill the Code Review” (March 2026) and synthesizes data from Faros.ai’s AI Productivity Paradox Report (2025), LinearB’s 2026 Software Engineering Benchmarks, and METR’s AI Developer Productivity Study (2025).
Victorino Group helps engineering organizations build AI systems they can actually govern. Let’s talk.
If this resonates, let's talk
We help companies implement AI without losing control.
Schedule a Conversation