The Thinking Gap Is a Governance Gap

Sophie Koonin, Staff Engineer at Monzo Bank, published a widely-shared essay last week arguing that developers should stop using AI code generation entirely. Her concerns are specific, practical, and grounded in real engineering experience: prompt engineering takes longer than writing code, generated code perpetuates poor practices, accountability collapses when nobody wrote the code, and the non-determinism of language models makes them fundamentally different from the deterministic tools we compare them to.

She is right about every symptom. She is wrong about the diagnosis.

Every problem Koonin identifies is a governance failure masquerading as a technology problem. The distinction matters, because the prescription changes completely depending on which diagnosis you accept.

The Accountability Concern

Koonin’s strongest argument is about ownership. Code you did not write is code you do not understand. When something breaks in production, who is accountable? The developer who accepted the suggestion? The tool that generated it? The organization that mandated its use?

This is a real problem. A 2026 Veracode report found that 97% of organizations have AI-generated code in production, but only 19% have visibility into which code was AI-generated. You cannot hold people accountable for code you cannot even identify.

But the accountability gap is not inherent to AI code generation. It is inherent to AI code generation without governance.

Consider the alternative: an organization that treats AI output as untrusted contributor code. Every AI-generated change goes through the same review process as code from a new hire --- or stricter. The AI’s suggestions are proposals, not decisions. The developer who approves the merge request owns the code, just as they would own code from any contributor they chose to merge.

This is not theoretical. It is how code review already works. The problem is that most organizations skipped the governance step. They gave developers AI tools without updating their review policies, ownership models, or quality gates. Then they blamed the tool when accountability collapsed.

The Quality Concern

Koonin argues that AI-generated code perpetuates poor practices --- accessibility violations, security vulnerabilities, performance anti-patterns. The data supports her. CodeRabbit and The Register reported 1.75x more logic errors and 2.74x more XSS vulnerabilities in AI-generated code compared to human-written code. A USENIX study found that 20% of LLM code samples recommend packages that do not exist.

These numbers are damning if you treat AI output as finished work. They are unsurprising if you treat AI output as a first draft from a junior contributor who has read a lot of code but has never shipped a production system.

The quality problem maps directly to a constraints gap. When a developer writes code, they carry implicit knowledge: the team’s architectural decisions, the security requirements for this domain, the accessibility standards the product must meet. None of this context exists in a default AI session.

But it can. This is what governance-as-code achieves. When you encode architectural constraints, security policies, and quality standards into the AI agent’s operating context --- through CLAUDE.md files, system prompts, or whatever mechanism your toolchain supports --- you are not hoping the model will figure out your standards. You are defining them as constraints the model operates within.

A Clutch survey from June 2025 found that 59% of developers use AI-generated code they do not fully understand. That statistic describes a governance failure: organizations deploying AI tools without requiring developers to understand what they ship. The fix is not to ban AI. The fix is to enforce understanding as a gate.

The Determinism Concern

Koonin makes an important technical point: language models are non-deterministic. The same prompt can produce different outputs. This makes the Industrial Revolution comparison flawed --- a loom produces the same fabric every time.

She is right that the comparison is imperfect. But the conclusion she draws --- that non-determinism makes the tool unreliable --- confuses generation with validation.

Software engineering has always dealt with non-determinism. Human developers are non-deterministic. Two engineers given the same specification will produce different implementations. We do not reject human engineering because of this variability. We manage it through testing, code review, type systems, linting, and CI pipelines.

The same infrastructure applies to AI-generated code. The output is non-deterministic. The validation is deterministic. Tests either pass or they do not. Type checks either succeed or they fail. Security scans either find vulnerabilities or they do not.

Enterprise teams that have built this validation infrastructure around AI tools report 81% quality improvement, according to 2026 enterprise surveys. Teams that hand developers AI tools with no validation infrastructure report the problems Koonin describes. The difference is not the tool. The difference is the governance around the tool.

The Thinking Concern

The deepest concern in Koonin’s essay is about cognition itself. She references Maggie Appleton’s concept of epistemological decay and Mikayla Maki’s distinction that LLMs automate typing, not thinking. The worry: if developers stop writing code, they stop thinking about code, and the ability to reason about software degrades.

This concern deserves serious engagement, because it touches something real. Writing code is a form of thinking. The act of translating intent into precise instructions forces a clarity that reading someone else’s code does not.

But the concern assumes a specific workflow: developer describes what they want, AI generates code, developer accepts code without deep engagement. In this workflow, yes, thinking atrophies.

That workflow is ungoverned AI usage. It is not the only workflow available.

In a governed workflow, the developer’s role shifts from typing to directing and reviewing. They define the architecture. They specify the constraints. They review the output against their mental model and reject what does not match. They write the tests that encode their understanding.

This is not less thinking. It is different thinking. It is the difference between a chess player calculating every move manually and a chess player using analysis tools to explore lines while maintaining strategic judgment. The strategic thinking intensifies. The mechanical execution shifts.

Karpathy described this shift in January 2026: the bottleneck moved from the ability to write code to the ability to define what should be written and review what was generated. That is not a reduction in thinking. It is a relocation of thinking to a higher abstraction level.

The organizations where thinking genuinely atrophies are the ones that deployed AI tools without restructuring the developer’s role --- without making architecture, review, and validation the explicit expectations. Again: a governance failure, not a technology deficiency.

The Horizon Warning

Koonin invokes the Post Office Horizon scandal --- where a faulty software system led to wrongful prosecutions of hundreds of sub-postmasters --- as a warning about trusting computer-generated output.

The invocation is powerful. But it argues for governance, not against AI.

The Horizon disaster was not caused by a technology generating incorrect output. It was caused by an organization that trusted that output without validation, accountability, or appeal mechanisms. The system said the numbers were wrong, and the institution accepted the system’s output as truth. No human review. No governance. No accountability.

This is precisely the scenario that governed AI implementation prevents. When you treat AI output as untrusted, when you require human review, when you maintain accountability chains, when you build validation infrastructure --- you are building the safeguards that Horizon lacked.

The lesson of Horizon is not “do not use computers.” The lesson is “do not trust any system’s output --- human or machine --- without governance.”

The Energy Concern

Koonin raises the environmental cost of AI. The IEA estimates data center energy consumption at roughly 415 TWh in 2024, projected to reach 945 TWh by 2030. This is a legitimate concern that deserves honest engagement, not dismissal.

Two points of context. First, the energy cost is real but must be measured against the alternative, not against zero. If AI-assisted development reduces rework cycles, catches security vulnerabilities before deployment, and prevents the kind of production incidents that require emergency fixes --- the net energy equation is more complex than “AI uses electricity.”

Second, energy efficiency of inference is improving rapidly. The relevant metric is not total energy consumption but energy per unit of useful output. If the same model serves ten times as many useful completions per kilowatt-hour next year, the concern changes in character.

This does not make the concern invalid. It makes it a systems optimization problem, not a binary accept/reject decision.

The Actual Gap

Koonin’s essay reveals something important, but it is not what she intended. It reveals the gap between how most organizations use AI tools and how they should use them.

Eighty percent of organizations report risky behaviors from AI agents, according to 2026 enterprise surveys. Not because AI is inherently risky. Because organizations deployed AI without the governance infrastructure that any powerful tool requires.

The pattern is consistent:

No review policies for AI-generated code results in code nobody understands
No architectural constraints in the AI’s context results in code that violates team standards
No validation infrastructure around AI output results in bugs and vulnerabilities
No accountability model for AI-assisted work results in ownership gaps
No role restructuring around AI tools results in skill atrophy

Remove any one of these governance gaps, and the corresponding concern from Koonin’s essay becomes manageable. Remove all of them, and you have a system where AI amplifies engineering capability rather than degrading it.

The Question That Matters

The debate Koonin’s essay invites --- should developers use AI? --- is the wrong question. It was settled by the 97% of organizations that already have AI code in production. The genie is not going back in the bottle, and framing the discussion as a binary choice between using AI and not using it wastes the energy that should go toward governing its use.

The right question is: what governance infrastructure does your organization need to use AI tools without the problems Koonin correctly identifies?

That question has specific, actionable answers. Review policies. Architectural constraints encoded as agent context. Validation and testing frameworks. Accountability models that treat AI output as untrusted input. Developer roles that emphasize architecture and review over typing.

None of this is easy. All of it is necessary. And none of it requires rejecting AI.

The thinking gap is real. But it is not caused by AI code generation. It is caused by the absence of governance around AI code generation. Close the governance gap, and the thinking gap closes with it.

Sources

Sophie Koonin. “Stop generating, start thinking.” localghost.dev, February 8, 2026.
Veracode. “State of Software Security 2026.” veracode.com, 2026.
CodeRabbit / The Register. “AI-generated code vulnerability analysis.” December 2025.
Clutch. “Developer AI Usage Survey.” clutch.co, June 2025.
USENIX. “Package Hallucination in LLM Code Generation.” usenix.org, 2025.
Maggie Appleton. “The Expanding Dark Forest and Generative AI.” maggieappleton.com, 2024.
Mikayla Maki. “LLMs Automate Typing, Not Thinking.” Zed Blog, 2025.
Andrej Karpathy. “A few random notes from claude coding.” X/Twitter, January 2026.
IEA. “Electricity 2024: Analysis and forecast to 2026.” iea.org, 2024.
Enterprise AI Governance Surveys. Various sources, 2026.

Victorino Group helps engineering organizations implement AI with the governance infrastructure that makes it work. If your team is using AI tools without review policies, architectural constraints, or accountability models, that is the gap to close.