When the Thing Being Improved Is the Improver: Governing Recursive Self-Improvement

TV
Thiago Victorino
8 min read
When the Thing Being Improved Is the Improver: Governing Recursive Self-Improvement

A frontier lab just put its own name on the loop most governance frameworks pretend does not exist. In “Recursive Self-Improvement,” Marina Favaro and Jack Clark write that Anthropic is “delegating a growing share of AI development to AI systems themselves, which is speeding up our work.” The endpoint of that trend, stated plainly, is “an AI system capable of fully autonomously designing and developing its own successor.” They give it a name: recursive self-improvement.

This matters because of who is saying it. We have written before about Devin building Devin and about Clark’s 60 percent probability by 2028. Those were a vendor case and a forecast. This is something different. It is a frontier lab describing, with previously unreported internal data, how the loop is already running inside its own walls and proposing how to govern it. The framework deserves a governance read, not a hype read.

The Loop Anthropic Drew

The piece lays out a five-stage timeline of how AI development itself has changed. From 2021 to 2023, humans built the first Claude. From 2023 to 2025, chatbots assisted with code snippets. Through 2025 and 2026, coding agents wrote and edited files independently. Today, autonomous agents run code and delegate to other agents. The final stage is marked “20XX?” and labeled “closing the loop”: agents building and training models.

That last stage is the one every existing control assumes will not happen on its watch. Most governance, including ours, is built around a review window: an agent proposes, a human inspects, a human approves or rejects. The five-stage timeline describes the steady narrowing of that window. When agents delegate to other agents, the human is no longer in the inspection path. They are reviewing a summary of a summary, if they review at all.

Anthropic splits the work being delegated into two categories worth holding separately. There is engineering, which they define as “writing the code, standing up the infrastructure, and overseeing the model training.” And there is research: “deciding what experiments to run, interpreting what comes back, and figuring out which ideas to try next.” Engineering is the mechanical half. Research is the judgment half. The governance question turns on which half the loop is closing.

The Numbers They Put on the Table

What makes this a position piece rather than a thought experiment is the internal data. Anthropic measures how long a coherent task its models can carry autonomously: Claude Opus 3 handled four-minute tasks in March 2024, Claude Sonnet 3.7 reached 90-minute tasks a year later, and Claude Opus 4.6 reached 12-hour tasks. On open-ended coding work, the piece reports a 76 percent success rate as of May 2026, a 50 percentage-point climb in six months.

The research-judgment numbers are the ones a governance reader should sit with. Anthropic reports that on experiment optimization, one internal model achieved roughly 3x speedups in May 2025, and a later preview reached roughly 52x by April 2026. And on the question of research taste, the share of times a model suggested a better next step than a human moved from 51 percent in November 2025 to 64 percent by April 2026.

Read that last figure carefully. It is not a capability benchmark. It is a measurement of the model outperforming the human at the exact decision the human is supposed to be supervising. When the supervisor is right less often than the supervised, the review window has not narrowed. It has inverted.

One Anthropic researcher, quoted in the piece, describes an autonomous research project this way: “Claude did all of this with pretty minimal help over 1-2 days. If a junior colleague returned with these results, I’d be mildly impressed.” Mildly impressed is the tone. The governance signal is in “pretty minimal help.”

Oversight of the Process, Not the Output

Here is where Anthropic’s framework breaks from the usual safety script, and where it earns attention. The proposed control is not a better output review. It is a commitment about the process itself.

The Anthropic Institute says it will “conduct research, in collaboration with many others, and take actions to help build the systems that a credible slowdown or pause would require.” And it names a condition: “If such systems existed, we expect that we would slow down or temporarily pause, if other developers at or near the frontier also did so in a verifiable manner.”

Notice the structure of that commitment. It is conditional on verification, and conditional on coordination. A pause only works if you can prove others paused too, which means the governance object is no longer a single model’s behavior. It is the collective rate of the loop across labs. That is an honest framing, and it is also a quiet admission: no single output review stops recursive self-improvement, because the risk does not live in any single output. It lives in the rate.

This is the move every board should copy at its own scale. Stop asking only whether the agent’s last commit was safe. Start asking whether you can still see, and still throttle, the rate at which agents are improving the agents. The automated code reviewer Anthropic runs on its own production changes is an output control. The pause commitment is a process control. You need both, and almost nobody has the second one.

Where This Piece Stops, and Why That Matters

One discipline before anyone builds a strategy on this. This is a single first-party source from a lab with a clear interest in being seen as the responsible actor in the room. It is co-authored by an Anthropic policy lead and an Anthropic co-founder. The internal numbers are “previously unreported,” which is another way of saying unaudited by anyone outside the building. Treat the framework as Anthropic’s position, not as independent consensus.

The piece is also careful where it is uncertain. It lays out three futures rather than one prediction: the trend stalls but the capability diffuses widely; efficiency compounds while humans keep setting direction; or full recursive self-improvement arrives with minimal human involvement. It does not tell you which. A governance reader should respect that restraint and resist the urge to import a timeline the source declined to commit to.

What the source does support is narrow and useful. The loop is real, it is partially running, and the only control that scales with it operates on the process rate, not on individual outputs.

Do This Now

Run one exercise this quarter. Draw your own version of Anthropic’s five-stage timeline for your own organization. Which stage are you at? Where in your pipeline does an agent now delegate to another agent with no human in the inspection path? That junction is your closing loop, and it is almost certainly already in production somewhere small.

Then write down the answer to one question for that junction: if the rate of agent-improving-agent doubled next quarter, would you see it, and could you slow it? If the answer to either half is no, you do not have a recursive self-improvement strategy. You have a review window that is quietly inverting while you watch the outputs.


This analysis synthesizes Recursive Self-Improvement (Anthropic, June 2026).

Victorino Group helps boards build process-level controls for self-improving agent systems, not just output reviews. Let’s talk.

All articles on The Thinking Wire are written with the assistance of Anthropic's Opus LLM. Each piece goes through multi-agent research to verify facts and surface contradictions, followed by human review and approval before publication. If you find any inaccurate information or wish to contact our editorial team, please reach out at editorial@victorinollc.com . About The Thinking Wire →

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation