The Acceleration Whiplash: When Verification Becomes the Job

Per-developer defect rate rose 9% in 2025 and 54% by 2026. Code churn is up 861%. Median time in review is up 441.5%. Pull requests merged with zero human review climbed 31.3%, and the ratio of incidents to PRs rose 242.7%. These numbers come from Faros AI’s “Acceleration Whiplash” report, built on two years of engineering telemetry from 22,000 developers across more than 4,000 teams. It is the first large-N measurement of what happens to software quality when AI writes a growing share of the code.

We have argued the mechanism before. In The Verification Tax we showed the time saved by AI nearly cancels the time spent checking it. In Cheap Code, Expensive Quality we traced where the cost lands. The Faros data is the proof, at a scale no survey can match. So this piece does not re-argue the thesis. It reads the curve.

What the telemetry actually shows

The headline is not that throughput went up. It did. Teams ship more code, faster, with AI in the loop. The problem is what rides along with the extra volume.

Faros measured the second-order effects that throughput dashboards hide. Code churn, meaning lines rewritten or reverted shortly after merge, rose 861%. That is the signature of code that compiled, passed a glance, shipped, and then had to be undone. Median time in review rose 441.5%, because larger and stranger changesets take longer to understand. The defect rate per developer rose 54% by 2026, up from a 9% bump the year before. And the incidents-to-PR ratio rose 242.7%, which means more of what merges turns into a production problem.

Two of those numbers belong together. PRs merged with zero review rose 31.3% at the same time roughly a quarter of all PRs are now reviewed by an AI agent rather than a person. The reviewing capacity did not vanish. It shifted to machines and, in a growing slice of cases, to nobody.

The curve is the story

The single most important property in the Faros data lives in the shape of the curve, more than in the size of any one number. The damage steepens as adoption deepens.

Faros frames this as a whiplash: throughput accelerates first, and the quality cost arrives a beat later, harder, the more AI you add. The 9% defect bump in 2025 becoming 54% in 2026 is not a one-time adjustment to a new tool. It is a trajectory. The more code AI writes, the faster the unreviewed, high-churn, incident-prone fraction grows. Adoption and reliability debt are moving in the same direction, and the debt is moving faster.

This is why a snapshot misleads. A team that measures itself today and sees higher output will conclude AI is working. The telemetry says the bill compounds on a delay. By the time incidents make it visible, the team has already restructured its habits around the higher throughput.

As Faros puts it, “throughput measures what was shipped, not what survived.” A velocity chart counts the first. Production counts the second.

The senior-engineer tax

The cost concentrates unevenly. It lands on the people best equipped to absorb it, which is exactly why it stays invisible to dashboards that track output.

For senior engineers, Faros found average time in code review rose 199.6% and median time to first review rose 156.6%. The most experienced people on the team are now spending double the time reading code, much of it generated, to decide whether it is safe to keep. Their output of new code may look flat or down. That is not a productivity problem. It is a reassignment. The senior engineer has been moved off authorship and onto verification, and the org chart has not caught up to the change.

This is the practical face of the whiplash. The junior or AI-assisted contributor produces more. A senior absorbs the review load that volume creates. The team’s net velocity looks positive on a chart that only counts merges, while the most expensive judgment in the building is being consumed by triage.

A sensor, not a verdict

The obvious response is to point more AI at the problem: if humans cannot review the volume, let agents review it. About a quarter of PRs already work this way. Addy Osmani, who works at Google, examined this directly in his essay on agentic code review, and his framing is the right one. AI review is “a sensor, not a verdict.”

A sensor flags. A verdict decides. Treating an agent’s approval as a decision is where teams get hurt, because the failure mode is correlated. Osmani warns of “borrowed confidence” in model-on-model loops: the same model that wrote the code, or one trained on similar data, shares its blind spots. When the author and the reviewer make the same mistakes, a green check is not independent confirmation. It is the same guess, twice.

The supporting data is unkind here. Osmani cites a December 2025 CodeRabbit study of 470 PRs in which AI-authored changes carried 1.7 times more issues than human-authored ones, roughly 75% more logic and correctness problems, and 1.5 to 2 times more security issues. The code that most needs an independent reviewer is the code least likely to get one, if the reviewer is another model.

Osmani’s line is the thesis in one sentence: “we made writing cheap, and understanding stayed exactly as expensive as it has always been.” Generation got a discount. Comprehension did not. The whiplash is the gap between those two prices, measured at scale.

A note on the source

Faros AI sells engineering-intelligence tooling, so its conclusion that you need observability to see this carries a vendor incentive. Read it with that in mind. The method, however, is stronger than the survey-based studies that dominate this debate, including DORA. Faros is reading two years of actual telemetry from real repositories, not asking people to estimate how they feel about AI. Self-report inflates and deflates with mood and politics. Commit history does not. The directional finding holds independent of who profits from naming it.

Do this now

Add one ratio to your engineering dashboard this week: incidents per merged PR, tracked over time, segmented by whether the PR had human review. If that number is climbing while throughput climbs, you have the whiplash, and you can see it before your customers do.

Then make a deliberate decision about AI review. Use it as a sensor that triages and flags, never as the verdict that lets a change merge unseen. Reserve human judgment for the changes the sensor cannot vouch for independently, and protect your senior engineers’ review time as the scarce resource it now is. The teams that survive the whiplash are not the ones that generate the most code. They are the ones that can still tell which code survived.

This analysis synthesizes The Acceleration Whiplash (AI Engineering Report 2026) (Faros AI, June 2026), Agentic Code Review (Addy Osmani, June 2026).

Victorino Group helps teams build the verification layer that keeps AI velocity from becoming reliability debt. Let’s talk.