The False Choice

The Phase Shift in Software Engineering

TV
Thiago Victorino
12 min read

In January 2026, Andrej Karpathy — former Director of AI at Tesla, founding member of OpenAI — published an account that crystallized what thousands of engineers were feeling but couldn’t articulate: in a matter of weeks, he flipped his work ratio from 80% manual code to 80% AI agents.

This isn’t marketing hype. It’s a field report from one of the world’s most qualified engineers, working with real code.

The 80/20 Flip

Karpathy describes a shift that happened between November and December 2025. Tools like Claude Code and Codex crossed what he calls a “threshold of coherence” — the point where AI agents became useful enough to justify changing your entire workflow.

The result: he now programs mostly in English, describing in words what the code should do.

Boris Cherny, creator of Claude Code at Anthropic, went further. In December 2025, he revealed that over the past 30 days he had submitted 259 pull requests — 497 commits, 40,000 lines added, 38,000 removed. Every line written by Claude Code + Opus 4.5.

This data matters not for the volume, but for what it reveals: when the tool reaches sufficient coherence, the bottleneck shifts from the ability to write code to the ability to direct and review.

The Productivity Paradox

Before celebrating the numbers, it’s worth confronting the counter-evidence.

The METR laboratory ran a randomized controlled trial in the first half of 2025 with 16 experienced developers working on mature open-source repositories. The result surprised everyone: developers using AI tools took 19% longer to complete tasks — despite believing they were 20% faster.

Simon Willison, developer and vocal advocate for AI tools, estimates that LLMs make him 2-5x more productive specifically at the writing-code portion of his work — which is a fraction of actual software engineering.

The contradiction between this data and the reports from Karpathy and Cherny isn’t a paradox — it’s a context indicator. The METR study evaluated tools from February-June 2025 (Cursor Pro with Claude 3.5/3.7), in repositories the developers already knew intimately. Karpathy describes tools from December 2025 onward. The evolution between these periods is significant.

But the more important point is different: the developers in the METR study kept using the tools after the experiment. 69% continued with Cursor. This suggests perceived gains go beyond speed — they include reduced cognitive fatigue and investment in learning the tools. Both Karpathy and Willison converge on a deeper distinction: “expansion” versus “acceleration.”

Acceleration versus Expansion

This is the most relevant distinction for enterprises.

Karpathy doesn’t say he does the same things faster. He says he does things he wouldn’t have done before — because they didn’t justify the time investment, or because they required knowledge he didn’t possess.

Willison describes the same phenomenon: the value of AI isn’t finishing tasks faster, but enabling projects that wouldn’t have justified the time without assistance.

For a company, this means: the right question isn’t “how much faster does my team code?” but “what problems can my team now solve that were previously infeasible?”

The 10x Engineer in the Age of Agents

Karpathy raises the question: what happens to the productivity ratio between the median engineer and the exceptional one?

Nate Meyvis, in an analysis published on his blog, argues this ratio likely grows. Exceptional engineers benefit disproportionately because they know how to formulate problems with greater precision, review results with better judgment, and compose solutions at scale.

But there’s an important detail: Meyvis suggests that among 95th-99th percentile engineers, AI may level the playing field — everyone accesses similar error-detection and design-improvement tools. The real divergence happens at the 99.9th percentile: engineers who combine strategic vision with technical mastery amplify their impact disproportionately.

This has a direct implication for hiring and talent development: the differential value of the exceptional engineer is no longer in typing speed or API memorization — it’s in the ability to define problems, evaluate trade-offs, and maintain architectural coherence when the agent cannot.

The Capability Overhang — The Real Gap

OpenAI, led by Sam Altman, framed this gap as the “capability overhang” at Davos: the growing distance between what AI can already do and what organizations actually use.

This gap isn’t technological — it’s organizational. What’s missing are adapted workflows, governance processes for AI-generated code, and above all “agentic engineering” skills: the ability to design, govern, and operate agents in production.

IDC data projects that over 90% of enterprises will face critical skills shortages in 2026, with potential losses of $5.5 trillion — reduced from $6.5 trillion estimated two years prior, with AI itself already mitigating $1 trillion of the impact. AI skills are now the most valued and demanded competency set, cited by 45% of respondents in the report.

The paradox: the more capable AI becomes, the more visible the engineering gap grows. Closing that gap doesn’t require smarter models — it requires a new engineering discipline.

The Risks Karpathy Identifies

Karpathy is not naive about the limitations. He catalogs specific problems that persist:

Subtle conceptual errors: Models no longer make syntax mistakes. They make assumption mistakes — forming silent premises and charging ahead without checking. The kind of error a hasty junior developer would make.

Over-engineering: Agents tend to unnecessarily complicate code. Karpathy describes cases where 1,000 lines could be 100 — but the agent only simplifies when challenged.

Skill atrophy: He notes that the ability to write code manually is already beginning to atrophy. Generation and discrimination are distinct cognitive capabilities — you can review code even when you can no longer write it with the same fluency.

Sycophancy: The models are excessively agreeable. They don’t challenge premises, don’t present trade-offs, don’t push back when they should. This improves in plan mode, but there’s a need for a lightweight inline plan mode integrated into the flow.

What This Means For Your Organization

The phase shift has already happened. The question isn’t whether, but how your organization responds.

For technology leaders: The bottleneck is no longer the ability to write code. It’s the ability to define what should be written, review what was generated, and maintain systemic coherence. Invest in engineers who think in architecture, not engineers who type fast.

For product managers: The expansion of feasible scope means previously impossible prototypes are now trivial. Use this for rapid validation — but don’t confuse prototyping speed with product quality.

For the organization: The capability overhang is real. If your team is still debating whether AI is useful for coding, you’re already behind. The question now is how to govern, measure, and scale the use of agents.

Karpathy closes his account with an observation worth reflecting on: intelligence arrived before infrastructure. Capabilities are ahead of integrations, organizational workflows, diffusion processes. 2026 is the year the industry metabolizes the new capability.

The question for every company is: will you metabolize, or will you be metabolized?


Sources

  • Andrej Karpathy. “A few random notes from claude coding.” X/Twitter, January 2026.
  • Nate Meyvis. “The future of 10x engineering.” natemeyvis.com, 2026.
  • METR. “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity.” metr.org, July 2025.
  • Boris Cherny. 259 PRs with Claude Code. X/Twitter, December 2025.
  • Simon Willison. “No, AI is not Making Engineers 10x as Productive.” simonwillison.net, August 2025.
  • IDC. “The $5.5 Trillion Skills Gap.” AI workforce readiness report, 2025.

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation