Anthropic Says Claude Now Writes Over 80% of Its Own Production Code

Anthropic published a paper this week called “When AI builds itself,” and the headline statistic is striking: more than 80% of the code merged into Anthropic’s production codebase in May 2026 was written by Claude. Before Claude Code launched in February 2025, that number was in the low single digits.

The paper, from Anthropic’s safety institute, traces what happened between those two data points and draws some conclusions about where things are heading.

The Productivity Numbers

By Q2 2026, the typical Anthropic engineer was merging eight times as much code per quarter as engineers did between 2021 and 2025. On the research side, a March 2026 internal poll of 130 staff found a median estimate of roughly four times as much output with Anthropic’s most capable model compared to working without AI assistance.

The improvement in Claude’s raw capability has been just as fast. Its success rate on open-ended, complex engineering problems reached 76% in May 2026, up 50 percentage points in six months. The task complexity horizon, how long Claude can maintain coherent effort on a single problem, has roughly doubled every four months.

One concrete example from the report: a routine software upgrade began crashing tens of thousands of training jobs at Anthropic. An engineer gave Claude access to the live incident with some text context and cluster credentials. Claude isolated an obscure debugging flag, reproduced the crash in an isolated environment, and confirmed a fix in about two hours. The same investigation would typically take two to three days.

Code Quality

Anthropic staff describe a clear trajectory. Claude-written code was “somewhat worse” than human-written code in late 2025. Today it’s at rough parity. The expectation from researchers who work with it daily is that it will be “strictly better” within the year.

That claim is worth examining skeptically, but the direction of travel is consistent with what Anthropic is seeing across other metrics. The 80% figure isn’t just Claude filling in boilerplate; it represents the majority of new functionality and fixes shipping to production.

The Recursive Problem

The report frames all of this as a recursion issue. Claude helps build Claude. The tools and infrastructure that make Claude better are being built, in large part, by Claude itself. That feedback loop is accelerating, and Anthropic’s position is that this dynamic was foreseeable but is arriving faster than expected.

The paper doesn’t treat this as just a celebration. About half of it is focused on what this means for safety and oversight, specifically the risk that AI systems improve themselves faster than humans can maintain meaningful control.

The Pause Proposal

The most consequential section of the report proposes a mechanism for a coordinated pause of frontier AI development. The proposal envisions a scenario where multiple well-resourced labs, across multiple countries, could collectively slow or temporarily halt development if specific conditions are met.

Anthropic acknowledges the obvious objection: any pause mechanism only works if everyone actually stops. A unilateral pause by responsible actors while others continue would just shift the frontier to less safety-conscious developers. The paper doesn’t resolve this, but argues that verifiable compliance infrastructure should be built now, before a situation arises where it would be needed urgently.

This is a significant position coming from one of the leading frontier labs. It’s also a somewhat unusual one: Anthropic’s commercial success depends on continuing to ship capable AI systems, and this paper is essentially arguing that the company’s own product category may need a global circuit breaker.

The “When AI Builds Itself” paper is available on Anthropic’s website at anthropic.com/institute/recursive-self-improvement.

Sources: