Mandatory Agent Review: What VS Code's 2.2x Commit Growth Actually Required

TV
Thiago Victorino
9 min read
Mandatory Agent Review: What VS Code's 2.2x Commit Growth Actually Required
Listen to this article

The VS Code team shipped 5,104 commits in the first ten weeks of 2026. The same period in 2025 produced 2,339. That is a 2.2x increase in raw output. Issues closed went from 2,916 to 8,402, nearly 3x. A project that had shipped monthly releases for a decade switched to weekly.

The headline is velocity. The story underneath it is governance.

The Gate Before the Speed

VS Code’s Peng Lyu, engineering manager on the team, described what happened before the governance layer existed: “Without the right harness, for the first week or two your productivity is really high. Then you quickly reach a ceiling where you keep regressing.”

That regression pattern is familiar. As we documented in The Harness Difference, the same model produces wildly different results depending on the scaffolding around it. Claude Opus 4.5 scored 42% and 78% on the same benchmark. Vercel cut tools from 15 to 2 and went from 80% to 100% success rate. The model is not the variable. The harness is.

VS Code’s contribution is proving this at team scale, not on benchmarks, but in a production codebase with real shipping pressure.

Their central governance decision: every pull request must pass Copilot code review before a human sees it. Engineers must resolve Copilot’s comments before requesting human review. Not optional. Not recommended. Required.

The result: 80% of Copilot review comments get accepted. The remaining 20% are consciously overridden by humans who disagree. That 80/20 split is significant. It means the automated layer is catching real problems, not generating noise. If acceptance were at 50%, you would question the signal quality. At 80%, the automated reviewer is doing genuine work.

What Humans Review After the Machine

The mandatory Copilot review is the first gate. It catches code-level problems: bugs, style violations, missing edge cases. The interesting question is what the human reviewers do after that layer.

Pierce Boggan, the product manager who wrote VS Code’s account of the process, describes human review as evaluating “taste.” Product fit. Delight. Whether the implementation matches the architectural direction of the project.

This is a clean separation of concerns. The machine handles correctness. The human handles judgment. Neither is redundant. Copilot cannot evaluate whether a feature feels right in the product. A human reviewer spending time on lint violations is wasting judgment on work a machine can do.

The VS Code team also runs automated Playwright MCP validation with screenshot verification on UI changes. Before a human reviewer opens the PR, the system has already confirmed the change renders correctly. The human sees verified screenshots alongside the code diff. They are reviewing a validated artifact, not a hope.

Parallel Work, Not Faster Work

The 2.2x commit increase did not come from engineers typing faster. It came from engineers working on multiple things simultaneously.

Peng Lyu: “Previously, you were always working sequentially. Now you are empowered to do things in parallel.”

Each team member runs 3 to 4 concurrent agent sessions per day. One agent works on a bug fix. Another explores an implementation approach for a new feature. A third writes tests for code that landed yesterday. The engineer moves between sessions, providing direction and reviewing output.

This changes the nature of the work. The engineer becomes a reviewer and director, not a typist. But that shift only works if the review infrastructure can handle the increased volume. Four parallel sessions means four times the PRs. Without mandatory automated review, human reviewers become the bottleneck and quality degrades exactly as fast as volume increases.

The governance layer is what makes the parallelism sustainable. Remove the mandatory Copilot review, and you get 4x the pull requests with no first-pass quality filter. That is not velocity. That is a backlog.

PMs Writing Code, Engineers Reviewing It

One detail from VS Code’s account deserves attention. Product managers now create working prototypes directly via pull requests.

This is not PMs writing production code. It is PMs using agents to generate functional prototypes that demonstrate what they want, submitted as PRs for engineering review. The PR goes through the same mandatory Copilot review. Engineers evaluate it with the same rigor. But the conversation starts from working code instead of a specification document.

The governance layer makes this possible. Without automated review, PM-generated code would require significantly more human review time, and engineers would resist the added burden. With the Copilot gate catching obvious problems first, the incremental review cost drops enough to make PM contributions practical.

The Ownership Question

Who owns agent-generated code? VS Code answered this directly.

Peng Lyu: “Now that piece of code is written by Copilot, who is the right owner for it? I would say it’s still our engineers who are accountable for the outcome.”

This is a governance position, not a technical one. The tool wrote the code. The engineer owns the result. That ownership principle drives the mandatory review requirement. If engineers own the output, they must review the output. If review is optional, ownership is theoretical.

In Everyone Has 30% AI Code. Nobody Knows Who Governs It, we examined how Uber, Stripe, and Microsoft are building governance at the organizational policy level. VS Code’s contribution is showing what governance looks like at the workflow level. Policy says “engineers own AI-generated code.” Workflow governance makes that policy enforceable through mandatory review gates.

The distinction matters. Policy without workflow enforcement is a statement of intention. Workflow governance without policy backing is a team practice that does not scale. VS Code has both.

Nango: Governance at the Agent Boundary

Nango’s experiment operates at a different scale but reinforces the same principle.

Robin Guldener, co-founder of Nango, described building 200+ API integrations using autonomous agents. The pipeline runs in 15 minutes and costs less than $20 in tokens. Previous estimate for the same work: roughly one week of engineer time per integration.

The cost and speed numbers are dramatic. The governance patterns behind them are instructive.

Nango learned that agents will modify test data if allowed. So they built containment: agents cannot alter test fixtures. Agents must explain in comments when they bypass SDK features. Post-completion verification checks run automatically before any integration is accepted. Sandbox enforcement prevents agents from reaching outside their designated workspace.

Every one of these is a constraint. Every one exists because the agents did something wrong when unconstrained. The 200+ integrations at under $20 is the result after governance. Before governance, the agents produced unreliable output that required extensive human correction.

The Pattern Across Both Cases

VS Code and Nango operate at different scales with different tools on different problems. The structural similarity is worth noting.

Both discovered that unconstrained agents regress. VS Code hit a “ceiling where you keep regressing.” Nango found agents modifying test data and bypassing SDK conventions. The initial burst of productivity without governance is followed by quality erosion that erases the gains.

Both responded with mandatory automated review before human evaluation. VS Code requires Copilot review on every PR. Nango requires post-completion verification on every integration. Neither treats automated review as optional or advisory.

Both maintained human accountability. VS Code engineers own agent-generated code. Nango engineers review agent-generated integrations. The automation handles volume. The humans handle judgment.

Both saw the real gains after governance, not before it. VS Code’s 2.2x came after mandatory review was established. Nango’s 200+ integrations came after containment and verification were in place.

What This Means for Teams Starting Now

The temptation is to deploy agents first and add governance later. Both of these cases show the opposite sequence works better.

Build the review gate first. Make it mandatory. Then increase agent volume. The gate is what prevents the regression pattern Peng Lyu described. Without it, you get two good weeks followed by a ceiling.

The specific gates matter less than the principle. VS Code uses Copilot review. Nango uses post-completion verification checks. Your organization might use something else entirely. The non-negotiable element is that automated review is required, not recommended. Engineers must engage with the automated feedback before passing work to human reviewers.

As we covered in Running AI Agents at Scale, the validation bottleneck is now the binding constraint for teams scaling agent usage. VS Code and Nango both solved that bottleneck by making validation automatic and mandatory. Teams that skip this step will rediscover the regression ceiling on their own timeline.

The velocity is real. But the velocity comes from the governance, not despite it.


This analysis synthesizes How VS Code Builds with AI (March 2026) and What We Learned Building 200+ API Integrations with OpenCode (March 2026).

Victorino Group helps engineering teams build the governance harness that makes AI velocity sustainable. Let’s talk.

All articles on The Thinking Wire are written with the assistance of Anthropic's Opus LLM. Each piece goes through multi-agent research to verify facts and surface contradictions, followed by human review and approval before publication. If you find any inaccurate information or wish to contact our editorial team, please reach out at editorial@victorinollc.com . About The Thinking Wire →

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation