The Week Verification Became Doctrine

TV
Thiago Victorino
6 min read
The Week Verification Became Doctrine
Listen to this article

Verification just got a manifesto. Reproducibility just got a scientific paper. Cross-model review just shipped as a CLI feature.

Three separate artifacts. One week. None of the authors coordinated. That is what a movement looks like when it crosses from argument into doctrine.

We have been writing about verification debt for months. About the tax verification imposes on every AI-assisted team. About how $200M in venture funding is now flowing toward verification as infrastructure. Each post argued the same thing from a different angle: verification is becoming the load-bearing concern.

What changed this week is that we no longer have to argue it.

The Manifesto

The AI Coding Agent Manifesto landed with twelve principles. Four of them matter more than the rest.

Contracts over conventions. Humans define the code skeleton. Types, interfaces, function signatures. The agent fills the implementation. This sounds like a stylistic preference. It is not. It is a boundary. A contract is checkable. A convention is vibes.

Verification over generation. Generating code is cheap. Proving it correct is expensive. The manifesto says the expensive part is the real work. Anyone can generate a thousand lines. Only a verified thousand lines belong in production.

Separation of generation and judgment. The model that writes the code cannot be the final judge of the code. You need a second opinion. From a human, from tests, from another model. But not from the same voice that just produced the artifact.

Guilty until proven innocent. This is the line that will make executives uncomfortable. AI-generated code is not trusted by default. It has to earn trust through verification. The burden of proof sits with the code, not with the reviewer.

None of these principles are new. What is new is that they are written down, numbered, and shareable. You can forward this document to your staff engineer tomorrow. You could not forward a mood last week.

The Scientific Paper

While the manifesto was circulating, Thinking Machines published something quieter and harder to dismiss. Their paper on defeating nondeterminism in LLM inference makes a claim that most teams have never internalized: large language models are not deterministic even at temperature zero. Not on hosted APIs. Not on your own hardware. Not with open-source inference libraries.

The paper explains why, and it is not a bug you can patch with a config flag. It is a property of how floating-point arithmetic composes across concurrent GPU kernels. The same prompt can produce different outputs on the same hardware. Reproducibility, the foundation of the scientific method, is not a default you get for free.

This matters because verification assumes reproducibility. If you cannot replay an output, you cannot audit it. If you cannot audit it, you cannot govern it. The manifesto tells you what posture to adopt. The paper tells you what the failure mode looks like when you do not.

The paper also proposes mitigations. That part is narrow and technical, and we should not oversell it. What is significant is the framing: nondeterminism is now a documented property, not a folk belief. Teams can point at it. Auditors can ask about it. The conversation has a citation.

The Shipping Tool

The third artifact is not an argument. It is a feature. GitHub released Rubber Duck inside the Copilot CLI: an experimental mode where a second model reviews the first model’s plan and code. It catches architectural issues, bugs, edge cases. It does not interrupt the workflow; it runs alongside it.

Rubber Duck is experimental, and we should say so plainly. It is not the default. It is not in every IDE. It will probably change shape before it stabilizes. But it is the first native cross-model reviewer in a major CLI, and that matters for one reason: it ships the posture of the manifesto. Generation and judgment, separated. A second opinion, by default, from a different voice.

The manifesto says you should do this. The paper says why you need to. Rubber Duck shows what it looks like when the tool does it for you.

What This Changes

For most of the last year, verification has been a preference held by careful engineers. Some teams practiced it. Others did not. Both could point at their shipping velocity and declare victory.

That era is ending. When principles are codified, when the failure mode has a paper, and when the tooling ships the posture, verification stops being a preference and becomes a contract. Organizations without a written verification policy are going to face pressure this quarter. From auditors asking how they govern AI output. From boards asking what their default posture is. From engineers who have read the manifesto and want to know why their company has not.

The vibe-code anti-pattern now has a counter-movement with a document to point at. That is a different conversation than the one we were having in January.

The Honest Caveat

A manifesto is a social artifact. It is not a standard body. Adoption is not guaranteed, and “guilty until proven innocent” can be paralyzing if applied without judgment. Not every line of code deserves the same scrutiny. Not every team has the reviewer capacity to treat every PR as a courtroom.

The point is not to slow down. The point is to make the posture explicit, so the organization can decide where to spend its verification budget instead of pretending the budget does not exist.

What makes this week real is not the manifesto alone. It is that three different authors shipped the same posture in three different formats in the same month. The argument phase of verification-first engineering is over. The doctrine phase has begun.

Your move is to write down what your team’s verification policy actually is. Not aspirationally. Actually. Because the next person who asks is not going to accept “we figure it out case by case” as an answer.


This analysis synthesizes The AI Coding Agent Manifesto (April 2026), Defeating Nondeterminism in LLM Inference by Thinking Machines (April 2026), GitHub Copilot CLI Combines Model Families for a Second Opinion (April 2026), and Write Less Code, Be More Responsible (April 2026).

Victorino Group helps engineering teams write the verification policy they’ll need to point at when the board asks. Let’s talk.

All articles on The Thinking Wire are written with the assistance of Anthropic's Opus LLM. Each piece goes through multi-agent research to verify facts and surface contradictions, followed by human review and approval before publication. If you find any inaccurate information or wish to contact our editorial team, please reach out at editorial@victorinollc.com . About The Thinking Wire →

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation