- Home
- Thinking...
- Cognitive Debt: The Invisible Cost of AI-Generated Code
Cognitive Debt: The Invisible Cost of AI-Generated Code
Martin Fowler published a piece this week reflecting on two industry events — the Thoughtworks Future of Software Development Retreat and The Pragmatic Summit hosted by Gergely Orosz. The post covers several threads: cognitive debt, AI work intensification, developer experience, mid-level displacement, context-switching fatigue. It is a useful field report from someone with a front-row seat to how the industry thinks about AI.
But field reports, by their nature, describe what people are observing. They do not always connect those observations into a structural argument. Fowler’s post identifies several important problems without connecting them to a common cause or a systemic response.
That connection is where the real insight lives.
Cognitive Debt Is Not a Coding Problem
Margaret-Anne Storey, writing on February 9, 2026, defines cognitive debt as the loss of shared understanding about system design decisions. Unlike technical debt, which lives in the code, cognitive debt lives in the heads of the people who work with the code. Or more precisely, it fails to live there.
Fowler picks up the concept and extends his existing distinction between cruft and debt. In his framing, cruft is the gap in knowledge itself. Debt is the cost that gap imposes over time. This is a useful refinement. But both Storey and Fowler frame the problem as a developer concern — something that happens between engineers and their codebase.
It is bigger than that.
Cognitive debt is what happens when an organization can no longer explain why its systems work the way they do. Not just the code. The business logic. The architectural trade-offs. The compliance constraints embedded in the data model. The reason that one service talks to another in a particular sequence. When AI generates clean, functional code that nobody on the team fully understands, the organization has traded technical debt for something more dangerous: institutional ignorance that passes every automated check.
Technical debt is visible. It shows up in build times, in test failures, in the friction developers feel when modifying old code. Cognitive debt is invisible. The system works. The tests pass. The deployment succeeds. The danger only surfaces when something needs to change — a new regulation, a new integration, a security incident — and nobody can explain the current system well enough to change it safely.
The METR Paradox Is Cognitive Debt in Action
The METR study from July 2025 is one of the most important data points in the AI productivity debate, and it deserves more attention than it gets. In a randomized controlled trial, experienced open-source developers were 19% slower when using AI tools — despite perceiving themselves as 20% faster.
That perception gap is not an error in self-reporting. It is a symptom.
The developers felt faster because the mechanical parts of coding — generating boilerplate, writing standard patterns, producing initial implementations — happened faster. But the total time increased because of what came after: reading unfamiliar code, verifying assumptions they did not make, debugging behavior they did not design. The AI accelerated production and decelerated comprehension. The net effect was negative.
This is cognitive debt accumulating in real time. The developer is writing code they understand less than if they had written it themselves. Each AI-assisted contribution adds a thin layer of opacity to the codebase. Any individual contribution is manageable. The accumulation is not.
Fowler’s post discusses the METR study in the context of whether AI actually improves developer productivity. That is the wrong frame. The more important question is: what is happening to the organization’s understanding of its own systems while the productivity debate plays out?
Work Intensification Is Not a Side Effect. It Is the Default.
Aruna Ranganathan and Xingqi Maggie Ye from UC Berkeley conducted an eight-month ethnographic study of 40 workers at a technology company, published in the Harvard Business Review on February 9, 2026. Their findings are direct: workers using AI tools worked at a faster pace, took on a broader scope of tasks, and extended work into more hours of the day — often without being asked to do so.
Three patterns emerged. Task absorption: workers absorbed tasks previously done by others, without a reduction in their own workload. Multitasking overload: the apparent simplicity of AI-assisted work encouraged constant task-switching. Erosion of breaks: the feeling that work was “easier” made rest feel unnecessary, until burnout arrived.
The burnout numbers are striking. 62% of associates reported burnout symptoms, compared to 38% of C-suite executives. The people doing the actual AI-assisted work are burning out. The people approving the AI strategy are not. This gap in experience creates a gap in governance: the decision-makers do not feel the problem they are creating.
Fowler mentions this study as evidence that AI intensifies work. True. But the governance implication is sharper: organizations that deploy AI without monitoring its impact on work patterns will burn through their people while believing they are making them more productive. The tool creates the appearance of ease while increasing the actual load. Without governance structures that measure real workload — not just output metrics — the damage is invisible until attrition makes it obvious.
The sample is small. Forty workers at one company is suggestive, not definitive. But the pattern aligns with what we see in client organizations, and the mechanism is intuitive: when AI makes any individual task feel easier, the organizational response is to assign more tasks, not to redistribute the freed capacity toward deeper work.
Developer Experience IS Agent Experience — and That Is a Governance Argument
Laura Tacho, CTO of DX, made a statement at the Pragmatic Summit that Fowler highlights: “The Venn Diagram of Developer Experience and Agent Experience is a circle.”
This is precisely correct. Code that is well-modularized, clearly named, and properly documented is easier for both humans and AI agents to work with. Bad code is hard for everyone — carbon and silicon alike.
But Tacho’s second observation is the one that matters for organizations making investment decisions: “Developer experience is still a way bigger lever for almost every company than AI-assisted engineering.”
This cuts against the prevailing narrative. Most organizations are investing heavily in AI tooling while underinvesting in the foundations that make AI tooling effective: code quality, documentation, clear architecture, well-defined interfaces. They are buying a powerful engine and installing it in a car with bald tires.
This is a governance decision masquerading as a technology decision. An organization that prioritizes AI tool adoption over developer experience is optimizing for speed on a road it cannot navigate. The governance question is: who is responsible for ensuring that the foundation supports the acceleration? In most organizations, nobody is. The AI tools get a budget line. The code quality improvement does not.
The Missing Middle
Fowler reports a concern from the summit conversations: mid-level developers face the greatest displacement risk from AI. Juniors benefit from AI-as-mentor. Seniors leverage AI from a position of architectural understanding. The mid-level developer — experienced enough to be expensive, not experienced enough to be irreplaceable — is caught between.
This framing has merit but misses the structural problem. Mid-level developers are not just a role. They are an institution’s operational memory. They are the people who know why the authentication service was built the way it was, why the data pipeline handles timezone conversion in that particular module, why the API versioning strategy changed in 2024. They carry the cognitive capital that prevents cognitive debt.
If AI displaces mid-level developers — through layoffs, through reclassification, through attrition that goes unreplaced — the organization loses the layer of understanding that sits between architectural vision and code-level execution. The seniors know what the system should do. The juniors know what the AI just generated. Nobody knows what the system actually does and why.
This is cognitive debt, compounding at the organizational level. And it is a governance failure to allow it to happen without mitigation — without knowledge capture, without documentation mandates, without deliberate investment in the institutional memory that mid-level developers carry.
Context-Switching Is a Governance Problem Too
Camille Fournier observes that managing multiple AI agents introduces a management-like context-switching burden to programming. Instead of thinking deeply about one problem, the developer is now supervising several agents, each working on a different task, each requiring review and course-correction.
This is real. And it connects directly to the cognitive debt problem.
Deep understanding comes from sustained attention. The developer who spends four hours working through a complex problem emerges with a mental model of that problem that is durable and transferable. The developer who spends four hours supervising six agents working on six problems may have generated more code, but their understanding of any individual problem is shallow.
Multiply this across a team, across months, and you have an organization that ships fast and understands little. The codebase grows. The comprehension shrinks. The cognitive debt compounds.
Fowler reports but does not extend this observation. The extension is straightforward: if multi-agent workflows are the future of software development, then governance must account for the comprehension cost. Sprints need to budget time for understanding, not just output. Code review must assess whether the reviewer actually understands the code, not just whether it passes tests. Architectural decision records must be mandatory, not optional.
IDE as Orchestrator: The Right Instinct, Half the Answer
Fowler describes an emerging model where the IDE becomes an orchestrator — using LLMs for semantic tasks and deterministic tools for mechanical changes. This is architecturally sound. It separates what AI does well (pattern recognition, natural language understanding, code generation) from what deterministic tools do well (refactoring, formatting, type checking).
But orchestration without governance is just faster chaos. The IDE-as-orchestrator model answers the question of how to use AI effectively. It does not answer the question of who is accountable for the output, how the organization maintains understanding of what was generated, or what happens when the orchestrated output is wrong in ways that only surface later.
The orchestrator model needs a governance layer. Not just technical guardrails — though those matter — but organizational accountability for the output that flows through the orchestration pipeline. Who reviews it? Who owns it? Who is responsible when it fails? These are not engineering questions. They are governance questions.
The Structural Response
Fowler’s post is valuable because it captures what thoughtful practitioners are observing on the ground. Cognitive debt is real. Work intensification is real. The mid-level squeeze is real. Context-switching is real. These are genuine problems being felt by real teams.
What the post does not provide — and what the industry conversation has not yet converged on — is a structural response to these problems. Individual observations require systemic solutions.
The structural response is governance. Not governance as bureaucracy. Governance as the organizational discipline of maintaining understanding, accountability, and human authority over AI-generated output.
Concretely:
Cognitive debt requires knowledge governance. Architectural decision records, mandatory documentation for AI-generated code, regular knowledge-sharing sessions where teams explain their systems to each other. If nobody can explain why the code works the way it does, the code is a liability regardless of how cleanly it runs.
Work intensification requires workload governance. Measuring actual effort, not just output. Monitoring for the task-absorption pattern the HBR study identified. Ensuring that AI-generated productivity gains translate to reduced load or deeper work, not to more tasks piled onto the same people.
The mid-level gap requires talent governance. Deliberate knowledge capture from mid-level developers. Investment in growing juniors into the mid-level role, not skipping it. Recognition that organizational memory is an asset that must be maintained, not a cost to be optimized away.
Context-switching requires attention governance. Budgeting time for deep understanding. Structuring multi-agent workflows so that developers maintain comprehension of what they are supervising. Treating review as cognitive work, not administrative overhead.
Orchestration requires accountability governance. Clear ownership of AI-generated output. Substantive review processes. Closed feedback loops that track whether AI output performed as expected in production.
Every problem Fowler identifies maps to a governance response. The problems are not new — verification debt, agency erosion, automation paradoxes. What is new is the speed at which they compound when AI generates code faster than humans can understand it.
What Fowler’s Audience Told Him
Fowler notes that at the Thoughtworks retreat, roughly one-third of attendees arrived “deeply skeptical” about AI’s value and left converted. He presents this as meaningful data. It is not — at least, not in the direction he implies.
A self-selected group of Thoughtworks employees at a company retreat is not a representative sample of the industry. More importantly, “converted” is doing heavy lifting in that sentence. Converted to what? To believing AI is useful? To believing it improves productivity? To believing it should be adopted without governance?
The METR data suggests that the perception of improvement and the reality of improvement diverge significantly. Developers who feel 20% faster while being 19% slower are, in a sense, “converted” — they believe in the tool’s value. The conversion is genuine. The belief may not be accurate.
This is not an argument against AI adoption. It is an argument for governed adoption — adoption that measures outcomes rather than perceptions, that tracks comprehension alongside velocity, that treats cognitive debt as a first-class metric rather than an externality.
The Uncomfortable Conclusion
The AI-in-software-development conversation is dominated by capability. What can AI do? How much code can it write? How fast? These are the easy questions because they have visible, measurable answers.
The hard questions are about what AI costs in ways that don’t appear on dashboards. The cognitive debt that accumulates when teams stop understanding their systems. The burnout that compounds when work intensifies behind a veil of efficiency. The institutional memory that evaporates when mid-level developers disappear. The comprehension that erodes when attention fragments across agent supervisory tasks.
Fowler’s post captures these costs as observations. The next step is treating them as governance requirements. Not because governance is glamorous — it is not — but because the alternative is organizations that ship faster and faster while understanding less and less, until the gap between velocity and comprehension produces failures that nobody on the team can diagnose.
That gap has a name. It is cognitive debt. And unlike technical debt, you cannot refactor your way out of it.
Sources
- Martin Fowler. “Fragments: February 13.” martinfowler.com, February 13, 2026.
- Margaret-Anne Storey. “Cognitive Debt in Software Development.” Published February 9, 2026.
- Aruna Ranganathan and Xingqi Maggie Ye. “Research: AI Tools Can Actually Slow Down Workers.” Harvard Business Review, February 9, 2026.
- METR. “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity.” metr.org, July 2025.
- Laura Tacho. Remarks at The Pragmatic Summit. San Francisco, February 11, 2026.
- Camille Fournier. Remarks on agent context-switching. February 2026.
- Veracode. “The State of Generative AI Code Security.” veracode.com, 2025.
- Sonar. “2026 State of Code Developer Survey.” sonarsource.com, January 2026.
Victorino Group helps organizations build governance infrastructure that keeps pace with AI capability — so your teams ship with understanding, not just velocity. If cognitive debt is accumulating faster than your organization can account for it, let’s talk.
If this resonates, let's talk
We help companies implement AI without losing control.
Schedule a Conversation