Governed Implementation

Three Views on Governing AI Code: Types, Specs, and Verifiable Trust

TV
Thiago Victorino
10 min read
Three Views on Governing AI Code: Types, Specs, and Verifiable Trust

In a single week of March 2026, three unrelated authors published three independent arguments about the same problem. Vint Cerf and co-authors wrote in ACM Communications that trust must become infrastructure for AI agents. A type theory practitioner argued that dependent types make the compiler a governance tool. And a Haskell expert demonstrated that sufficiently detailed specifications inevitably collapse into code.

They were not responding to each other. They arrived at the same boundary from different directions. That convergence is the signal worth paying attention to.

We have been building toward this argument since February. In The Specification Problem Is the Governance Problem, we showed that formal verification is governance because specifications encode values. In $200M Says Verification Is Infrastructure, we documented the market validating that thesis with institutional capital. What these three new pieces add is a triangulation: the same conclusion reached through network architecture, programming language theory, and practical software engineering. Three disciplines. One finding.

Position One: Trust as Infrastructure

Mallik Tatipamula, David Attermann, and Vinton Cerf published “From Distributed Intelligence to Verifiable Responsibility” in ACM Communications. Cerf is not a minor voice here. He co-designed TCP/IP. He was president of ACM. When he describes something as an infrastructure requirement, the claim carries the weight of someone who has built infrastructure that billions of people use daily.

Their argument: the AI-native internet will not scale on distributed intelligence alone. It requires distributed responsibility, made enforceable through verifiable accountability. Autonomous coordination depends on the ability of systems to evaluate the legitimacy of actions at runtime, without reliance on centralized intermediaries or retrospective governance.

Read that last clause again. “Without reliance on centralized intermediaries or retrospective governance.” This is a direct rejection of the dominant compliance model, where a human reviews an AI’s output after the fact and decides whether it was acceptable. Cerf and his co-authors are saying that model does not scale. Trust must be evaluated at runtime, by the infrastructure itself, as actions happen.

The analogy they draw is structural. Routing was the infrastructure function that made Web 1.0 work. Orchestration was the infrastructure function that made Web 2.0 work. Trust is the infrastructure function that will make the AI-native web work. Or fail to work, if nobody builds it.

This is not a theoretical position. It is an architectural claim about what the next protocol layer needs to do.

Position Two: Types Are Proof

The second piece, “Don’t Vibe, Prove,” comes from the dependent type theory community. The author, NGrislain, makes a specific technical argument: AI code generation changes the value proposition of programming languages. When humans wrote all the code, readability and expressiveness mattered most. When AI writes the code, correctness guarantees matter most. Dependent type systems like Lean 4 provide those guarantees by construction.

The demonstration is concrete. An insertion sort algorithm in Lean 4 takes 6 lines. The proof that the algorithm is correct takes 25 lines. The type system enforces invariants at compile time: the output is sorted, contains exactly the input elements, nothing is lost or duplicated. If the code compiles, it is correct. Not tested-and-probably-correct. Provably correct.

The interesting discovery happened during the proof process. The type system demanded a proof of total ordering on the input type. This is a mathematical requirement that humans might overlook (and often do). The compiler caught a missing assumption that testing would likely miss, because tests only check the cases you think to write.

The author’s framing is sharp: “The specification IS the code. There is no distance between the invariant and the implementation.” And on the AI angle: “The adversarial feedback loop that exhausts human programmers is exactly the environment where AI excels.” Humans find the back-and-forth of satisfying a type checker tedious. AI does not get tired. The combination of dependent types and AI code generation turns the compiler into a verification oracle that never needs to sleep.

This connects directly to the Axiom story. When Axiom raised $200M to build verification infrastructure on Lean 4, the market was betting on exactly this dynamic. AI generates the code. The type system proves it correct. The compiler is the governance layer.

Position Three: Specs Become Code

Gabriella Gonzalez, writing on Haskell for All, attacks from the opposite direction. Her argument: the claim that specifications are simpler than implementations is false. A sufficiently detailed specification necessarily becomes code.

She tests this empirically. OpenAI published Symphony, a multi-agent framework, with a specification document that is roughly one-sixth the length of the implementation. Gonzalez points out that the spec is incomplete. She then used Claude Code to generate Symphony in Haskell from that specification. The result had multiple bugs. The agent “spun silently,” producing output that compiled but failed at runtime.

Her conclusion: “If you try to make a specification document precise enough to reliably generate a working implementation, you must necessarily contort the document into code.” The spec-code boundary is an illusion. As specifications become precise enough to be useful for AI code generation, they become programming in a different syntax.

This finding reinforces what we explored in When the Spec Is the Product, Who Governs the Spec?. If the spec is the true control surface, and if specs must become code-level precise to be useful, then governance must operate at code-level precision. There is no comfortable abstraction layer where business stakeholders can write English prose and get verified software. The governance problem lives at the same level of detail as the implementation problem.

Where the Three Positions Converge

Each author would probably disagree with the other two on specifics. Cerf is thinking about network protocols and agent interoperability. NGrislain is thinking about compiler-enforced correctness within a single codebase. Gonzalez is thinking about the practical limits of specification as a methodology.

But they converge on one point. Governing AI-generated systems requires machine-verifiable constraints that operate at the level where the work actually happens.

Cerf says: trust verification at runtime, in the infrastructure, not after the fact by a human.

NGrislain says: correctness enforcement at compile time, in the type system, not after the fact by a test suite.

Gonzalez says: specification precision at code level, in the formal language, not after the fact by a prose document.

All three are rejecting the same thing: the belief that governance can operate at a comfortable distance from implementation. Retrospective review. Manual testing. English-language specs. These are the mechanisms most organizations rely on today, and all three authors, independently, in the same week, published arguments for why those mechanisms are insufficient.

The Uncomfortable Implication

If these three positions are correct (and the convergence suggests they are), then most organizations face a structural problem. Their governance infrastructure operates at the wrong level of abstraction.

Board-level AI policies are written in natural language. Compliance frameworks evaluate outputs after deployment. Security reviews happen at pull request time, which is after generation but before production, a narrow window that scales poorly. Specifications live in Confluence pages that no compiler has ever read.

The alternative these three pieces point toward is verification infrastructure that is embedded in the development process itself. Type systems that reject incorrect code before it reaches a human reviewer. Protocol-level trust evaluation that rejects unauthorized agent actions before they execute. Specifications written in formal languages that a machine can check, not prose that a committee can interpret.

This is harder than what most organizations are doing today. It requires investment in tools, training, and process change. But the alternative is an ever-growing verification burden handled by an ever-shrinking pool of qualified human reviewers. That path does not scale. Cerf says so. The type theory community says so. The practical engineers say so.

What Organizations Should Do

Treat verification as infrastructure, not process. Cerf’s architectural analogy is the right frame. Routing is not a process someone performs manually. It is infrastructure that operates automatically. Verification needs the same treatment. The organizations that embed verification into their toolchains (type systems, formal specs, automated proof) will scale their AI usage. The organizations that rely on human review will bottleneck.

Evaluate dependent type systems for high-consequence code. Lean 4 is the current leader, and Axiom’s $200M bet confirms institutional confidence. For cryptographic libraries, authorization logic, financial calculations, and safety-critical systems, the 25-lines-of-proof-per-6-lines-of-code ratio is a bargain compared to the cost of a production failure. Start with the code that would be most expensive to get wrong.

Stop pretending specs and code are separate artifacts. Gonzalez’s insight saves organizations from a common trap: investing in elaborate specification documents that are too imprecise to generate correct code and too complex to maintain alongside the implementation. If you need formal precision, write in a formal language. If you need flexibility, accept that your spec is a guide, not a guarantee, and invest in runtime verification to catch what the spec missed.

Build the governance muscle at the right altitude. The pattern across all three pieces is that governance must operate where the work happens. For compiled code, that means the type system. For networked agents, that means the protocol layer. For specifications, that means formal languages. Identify where your highest-risk AI-generated code lives and push verification to that level.

The convergence of these three independent arguments is not a coincidence. It reflects a maturing understanding of what AI governance actually requires. Not policies. Not review boards. Not retrospective audits. Machine-verifiable constraints, operating in real time, at the level of implementation.

The organizations that build this infrastructure will govern their AI systems. The rest will merely hope their AI systems behave.


This analysis synthesizes From Distributed Intelligence to Verifiable Responsibility (March 2026), Don’t Vibe — Prove (March 2026), and A Sufficiently Detailed Spec Is Code (March 2026).

Victorino Group helps enterprises build verification infrastructure for AI-generated code, from type-driven governance to trust-aware architectures. Let’s talk.

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation