The AI Control Problem

Builders, Reviewers, and the Governance Nobody Mentioned

TV
Thiago Victorino
9 min read
Builders, Reviewers, and the Governance Nobody Mentioned

Harrison Chase, CEO of LangChain, published a post on X in early March 2026 titled “How Coding Agents Are Reshaping Engineering, Product and Design.” It gathered 563,000 views. The thesis: PRDs are dead, the bottleneck shifts from building to reviewing, generalists win, and everyone in product teams divides neatly into builders or reviewers.

Chase gets several things right. He also gets one thing catastrophically wrong. Not by what he says, but by what he leaves out entirely.

Where Chase Is Correct

The review bottleneck is real and measurable. LinearB’s 2026 benchmarks, covering 8.1 million pull requests across 4,800 organizations, found that AI-generated PRs have a 32.7% acceptance rate compared to 84.4% for manually written ones. AI-generated PRs wait 4.6 times longer for review. The code is being produced. It is stacking up in queues waiting for human judgment.

Chase also makes an underappreciated observation about product managers: when code generation is cheap, bad PMs become exponentially more destructive. A PM who defines the wrong problem used to waste one sprint of engineering time. Now that same PM wastes one sprint of engineering time plus hundreds of generated artifacts, test suites, documentation, and infrastructure that all point in the wrong direction. The blast radius of bad intent definition grew by an order of magnitude.

His final genuine insight is sociological. “Everyone thinks their role is most advantaged by AI, and they’re right.” This observation captures something real about identity-driven bias in how people evaluate technology. PMs see agent-as-builder and feel empowered. Engineers see agent-as-colleague and feel amplified. Designers see agent-as-renderer and feel liberated. Each perspective is locally rational and globally incomplete.

The Binary That Breaks

Chase’s central framework divides the world into builders and reviewers. You either generate or you evaluate. Pick a side.

This is a false binary. It misses at least three categories of work that do not fit either role.

Operators. SREs, platform engineers, DevOps practitioners. They do not build features or review feature code. They maintain the environment where everything runs. In Chase’s framework, they are invisible. In production, they are the reason anything works at all.

Architects. People who define system-level constraints before any code is generated. They are not building individual features. They are not reviewing individual PRs. They are designing the structural properties that make individual decisions coherent across a large codebase.

Governors. The role Chase’s entire article circles without naming. Someone needs to decide what “good” means for the organization. Someone needs to define the acceptance criteria that reviewers apply. Someone needs to maintain coherence across teams, codebases, and time horizons. As we explored in Your Product Team Was Designed for a World That No Longer Exists, this is the shift from production-bottleneck thinking to intent-specification thinking. The builder/reviewer split is a simplified version of that larger transformation.

Most senior engineers oscillate between building and reviewing multiple times per day. The binary is not how work actually happens. It is how work looks from the perspective of someone selling tools for one side of it.

The Data Chase Doesn’t Cite

The entire post contains zero citations. No data. No studies. No external evidence. Every claim is presented as self-evident truth from someone whose company directly benefits from increased agent adoption. LangChain’s $1.25 billion valuation depends on the thesis that everyone should be using coding agents more aggressively. This is not disqualifying, but it is worth stating plainly.

The data that exists tells a more complicated story.

METR’s 2025 randomized controlled trial found that experienced open-source developers were 19% slower when using AI tools. They believed they were 24% faster. The perception mismatch is larger than the actual deficit. Chase’s claim that “anyone can write code now” runs directly into this finding. The tools help. They also introduce new costs that users systematically underestimate.

Veracode’s 2025 analysis found that 40-48% of AI-generated code contains security vulnerabilities across 100+ LLMs tested. Chase mentions nothing about security. Not once in the entire post.

Kiteworks’ 2026 report found that 63% of organizations cannot enforce AI purpose limits. Gartner projects 40% of firms will face shadow AI security incidents. Chase mentions nothing about governance, compliance, or regulatory requirements.

CodeRabbit’s 2025 data shows AI code review surfaces 1.7 times more issues than human review. LeadDev reports PR review time increased 91% for AI-assisted teams. The review bottleneck Chase identifies is real. His prescription (hire more reviewers, train generalists) treats it as a headcount problem. The data suggests it is an infrastructure problem.

The Generalist Ceiling

Chase argues that generalists become more valuable when coding agents handle implementation. This is partially true and importantly incomplete.

Addy Osmani’s analysis at Google found that 45% of engineering roles now expect multi-domain proficiency. But the pattern that actually wins is T-shaped: deep expertise in one domain combined with broad competence across others. Pure generalists hit a ceiling. They can direct agents across many domains, but they cannot evaluate whether the output meets the standards of any single domain.

The review bottleneck is not a problem of too few reviewers. It is a problem of insufficient evaluative depth. A generalist reviewing AI-generated database migrations, security configurations, and frontend accessibility changes in the same afternoon will miss things that a specialist would catch. Generalism is necessary. It is not sufficient.

This distinction matters for how organizations staff their teams. “Everyone is a generalist now” leads to one structure. “Everyone needs breadth, but review requires depth” leads to a very different one.

PRDs as Structured Prompts: The Seed of Something Real

One idea in Chase’s post deserves more attention than he gives it. He suggests that product requirement documents might evolve into “structured, versioned prompts” that serve as companions to prototypes rather than waterfall artifacts.

This is directionally correct and practically important. When agents generate code from specifications, the specification becomes executable. A vague PRD produces vague code. A precise, structured specification produces code that can be evaluated against explicit criteria.

The implication Chase does not draw: if specifications become prompts, then specification quality becomes a governance concern, not just a product management skill. Who reviews the specifications? Who ensures they are consistent across teams? Who maintains versioned specifications as requirements evolve? These are organizational problems, not individual ones. As we examined in Cheap Code, Expensive Quality, when production drops to near-free, the natural governor of cost disappears. Specification discipline becomes the replacement governor, and it requires infrastructure.

Anthropic’s 2026 research on agentic coding found that agents are learning to detect uncertainty and request human input. This is the governance layer Chase omits. The agents themselves are developing the capacity to flag “I don’t know if this is right.” The organizations deploying them need the infrastructure to handle those flags at scale.

Who Reviews the Reviewers?

Chase’s framework has a recursive problem he does not address. If the bottleneck moves to review, review quality becomes the critical variable. But who evaluates review quality? Who ensures that reviewers across an organization are applying consistent standards? Who catches the reviewer who approves a PR because they are overloaded and the diff looks reasonable?

Harvard research found that junior developer employment dropped 9-10% within six quarters of AI tool adoption. The junior developers who traditionally learned by writing code are disappearing. These are the same people who, in a few years, would have become the experienced reviewers the organization now needs. The pipeline that produces qualified reviewers is being disrupted at the same time that demand for reviewers is spiking.

This is not a problem that “train generalists” solves. It is a structural workforce planning challenge that requires deliberate investment in review infrastructure: automated quality gates, architectural fitness functions, specification-driven acceptance criteria, and systematic reviewer development programs.

The Real Split

The meaningful divide is not between builders and reviewers. It is between organizations that treat review as a human-only activity and organizations that build review into their infrastructure.

The first group will hit a ceiling. They will generate more code than their reviewers can evaluate. Review quality will degrade under volume. Defects will reach production. The “AI productivity gains” will be consumed by incident response and rework.

The second group will treat review the way Uber and Stripe treat governance: as infrastructure. Automated quality gates that handle the 60% of reviews that are mechanical (style, types, known patterns). Specification-driven acceptance that compares output against structured intent. Human review concentrated on the 40% that requires judgment (architecture, business logic, trade-offs).

Chase describes the symptoms accurately. The bottleneck moved. Generalists are more useful. Bad PMs are more dangerous. Everyone feels advantaged. All true.

But diagnosing symptoms without prescribing treatment is not thought leadership. It is commentary. The treatment is governance infrastructure that scales review to match generation. Not more reviewers. Better review systems.


This analysis synthesizes Harrison Chase’s How Coding Agents Are Reshaping EPD (March 2026), LinearB 2026 Benchmarks (8.1M PRs, 4,800 orgs), METR’s 2025 RCT on developer productivity, Veracode’s AI code security analysis (2025), and Addy Osmani’s analysis on engineering role evolution.

Victorino Group helps organizations build the governance infrastructure that makes AI-generated code trustworthy at scale. Let’s talk.

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation