- Home
- The Thinking Wire
- 512,000 Lines of 'Safety-First': What Claude Code's Source Leak Reveals About AI Governance Theater
512,000 Lines of 'Safety-First': What Claude Code's Source Leak Reveals About AI Governance Theater
On March 31, 2026, Anthropic published the complete source code of Claude Code to npm. Not deliberately. A 59.8 MB source map file containing roughly 1,900 files and 512,000 lines of TypeScript shipped in version 2.1.88 because someone forgot to add an entry to .npmignore.
Within hours: 84,000 GitHub stars. Thousands of forks. The entire architecture of the tool that Anthropic positions as the safe, responsible way to give AI agents access to your codebase was available for anyone to read.
This was Anthropic’s second leak in five days. On March 26, an unreleased model codenamed Mythos appeared briefly before being pulled. The company that markets itself on safety, interpretability, and responsible deployment had two operational security failures in less than a week.
Anthropic’s official response: “No sensitive customer data or credentials were involved. Release packaging issue caused by human error, not a security breach.”
They are technically correct. And they are missing the point entirely.
The Root Cause Is the Story
The suspected cause deserves attention. The missing .npmignore entry may trace to a known bug in Bun, the JavaScript runtime (oven-sh/bun#28001). Anthropic acquired Bun’s parent company in December 2025. Their own toolchain may have caused the leak.
This is not ironic. It is instructive. A company building AI governance tools could not govern its own build pipeline. The failure was not sophisticated. Nobody exploited a zero-day. Nobody ran a social engineering campaign. A configuration file was incomplete. That is it.
The lesson for every organization deploying agentic tools: if Anthropic cannot prevent a .npmignore oversight from exposing half a million lines of proprietary code, what makes you confident that your permission configurations, your CLAUDE.md files, your agent trust boundaries are correctly specified?
Configuration errors do not announce themselves. As we documented in From 0% to 91%: Why Agent Safety Is Configuration-Dependent, a single configuration change can swing agent behavior from safe to dangerous. The Claude Code leak is that principle applied to the vendor itself.
What the Code Actually Shows
The leaked source is more revealing than a typical code leak because it exposes the distance between Anthropic’s public positioning and its operational reality. Several findings deserve scrutiny.
Anti-Distillation: Security Through Pollution
The codebase contains an ANTI_DISTILLATION_CC flag that injects fake tool definitions into API traffic. The purpose: poison training data that competitors might harvest by intercepting Claude Code’s API calls. If someone captures the traffic and trains on it, the fake definitions corrupt their model.
The concept is reasonable. The implementation is not. Security researchers found it can be bypassed by a MITM proxy that strips a single field, or by setting one environment variable. This is security through obscurity, and not even good obscurity. It is the software equivalent of hiding a key under the doormat and hoping nobody checks there.
Undercover Mode: When AI Hides Its Identity
A file called undercover.ts contains logic to strip AI authorship signals when Claude Code contributes to open-source repositories. The system prompt associated with this mode includes the instruction: “Do not blow your cover.”
There is no mechanism to force this mode off. The feature exists to make AI-generated code appear human-written in public repositories.
Set aside the ethical questions about attribution and transparency for a moment. Consider what this means operationally. Organizations that depend on knowing whether code was AI-generated (for compliance, for audit trails, for liability purposes) cannot trust authorship metadata from repositories where Claude Code was used. The tool is designed to obscure its own involvement.
KAIROS: The Autonomous Agent Nobody Announced
The code reveals an unreleased system called KAIROS: an always-on autonomous agent mode with daemon workers, nightly memory distillation, GitHub webhook integration, and cron scheduling. This is not a feature that Anthropic has publicly discussed.
KAIROS represents a qualitative shift from “AI assistant that helps when asked” to “AI agent that acts continuously without prompting.” The 44 hidden feature flags found in the codebase tell the same story: background agents, multi-agent orchestration, voice commands, browser control via Playwright, agents that sleep and self-resume.
Anthropic is building an agentic operating system. They have not disclosed this publicly. Organizations making deployment decisions based on Anthropic’s public documentation are making those decisions with incomplete information.
The Permission Model Under the Hood
We analyzed Claude Code’s permission architecture when auto mode launched. The leaked source confirms and extends those concerns. The code contains 23 bash security checks, but the permission chain includes early-allow shortcuts that bypass downstream validators. A broad permission rule like Bash(git:*) can be exploited to escape the intended constraint boundary.
The 4-stage context compaction pipeline is more concerning. When Claude Code compresses its context window, the compaction process can be exploited for context poisoning. Malicious instructions placed in a CLAUDE.md file can survive compaction by being classified as “user feedback” rather than external input. The agent then treats poisoned instructions as legitimate user intent.
This connects directly to a pattern we have tracked. In 46.5 Million Messages in 2 Hours, we argued that agent security is an architecture problem, not a prompting problem. The leaked code proves it. The vulnerabilities are structural. They exist in the compaction pipeline, in the permission chain shortcuts, in the trust boundary between project files and system files. No amount of prompt engineering fixes architecture.
The Telemetry Question
The leaked code reveals that every file Claude Code examines is stored in plaintext JSONL and uploaded to Anthropic’s servers. Remotely managed settings are pushed hourly. Sentry error reporting captures the working directory path.
Anthropic’s privacy policy presumably covers this. But “covered by privacy policy” and “understood by users” are different things. A developer who installs Claude Code to help write code may not realize they are also providing a complete map of their codebase structure, file access patterns, and working directory organization.
For enterprise customers operating under data governance requirements, this telemetry raises questions that should have been asked before deployment, not discovered through a source code leak.
The Supply Chain Timing
Here is a detail that deserves more attention than it has received. During the same hours that Claude Code’s source was leaking on npm (the early hours of March 31 UTC), a trojanized version of the popular axios npm package was live on the registry. Published at 00:21 UTC and removed at 03:29 UTC, it contained a cross-platform remote access trojan.
No evidence connects these events. But the coincidence illustrates the compounding nature of supply chain risk. An organization that pulled Claude Code v2.1.88 during those three hours was exposed to both a source code leak from its AI tooling vendor and a supply chain attack from a compromised dependency. In the same npm install.
As we argued in 46.5 Million Messages in 2 Hours, security requires architecture, not just good intentions. The npm ecosystem, where AI developer tools and their dependencies live, provides neither.
The Architecture Insight Worth Keeping
Not everything in the leak is damning. Sebastian Raschka’s analysis of the codebase architecture reveals something valuable for anyone building or evaluating agentic tools: the competitive advantage is not the model. It is the harness.
Claude Code’s architecture includes repository context loading with prompt caching at dynamic boundaries, specialized tool definitions, a three-layer memory system (session, project, global), and a subagent architecture supporting fork, teammate, and worktree patterns. The sophistication is in the orchestration layer, not the LLM underneath.
This matters for governance. If the risk surface of an agentic tool is primarily in the harness (the permission model, the context pipeline, the telemetry system, the tool definitions) rather than in the model, then governance must focus on the harness. Model evaluations and safety benchmarks, the things AI companies publish and regulators measure, miss most of the attack surface.
What This Means for Organizations
The Claude Code leak is not primarily an Anthropic problem. It is a category problem. Every agentic coding tool (Cursor, Windsurf, Copilot, Cline, and their successors) has a harness. Every harness has permission models, context pipelines, telemetry systems, and trust boundaries. Most of those harnesses are closed-source. You are trusting them based on marketing materials and terms of service.
The leak collapsed the cost of attacking Claude Code specifically. Reverse-engineering overhead dropped to zero overnight. But the systemic insight is more important: the vulnerabilities found in Claude Code’s architecture (context poisoning via compaction, permission bypasses through broad rules, telemetry that maps codebases) are not Claude-specific. They are structural features of agentic architectures.
Three questions for any team deploying agentic developer tools:
What does your agent’s permission model actually allow? Not what the documentation says. What the implementation permits. The Claude Code leak shows that early-allow shortcuts can bypass downstream validators. If you cannot read the source of your agent’s permission system, you are trusting a black box to enforce your security boundaries.
What telemetry does your agent send, and to whom? “We take privacy seriously” is not an answer. The specific data, the retention policy, the transmission frequency, and the conditions under which remotely pushed configuration changes can alter agent behavior: these are governance requirements, not feature requests.
What is your response plan when (not if) your AI tooling vendor has a security incident? Anthropic had two in five days. Your vendor will have theirs. The question is whether you learn about it from their disclosure or from Hacker News.
The Real Governance Failure
Anthropic will fix the .npmignore file. They will probably audit their build pipeline. The immediate technical failure will be resolved.
The deeper failure persists. A company whose entire brand proposition rests on “we build AI safely” demonstrated that it cannot safely publish an npm package. Twice in a week, proprietary information that Anthropic intended to keep private became public through elementary operational failures.
This is not about holding Anthropic to an impossible standard. Every software company ships bugs. The issue is the specific claim being made. When your differentiator is safety and governance, your operational reality must match your rhetoric. Two leaks in five days is not a rounding error. It is a data point about the distance between aspiration and execution.
For organizations evaluating AI tooling vendors, the lesson is straightforward: do not evaluate vendors on their safety claims. Evaluate them on their safety operations. Ask for incident histories. Ask for build pipeline documentation. Ask for the boring, operational evidence that safety is a practice, not a marketing position.
The 512,000 lines of TypeScript that Anthropic accidentally published are already forked, analyzed, and archived. That code cannot be unpublished. What can change is how organizations use this moment: not as ammunition against one vendor, but as evidence that agentic tool governance requires the same rigor we apply to any other critical infrastructure.
The source code is out. The question now is what we do with what it taught us.
This analysis synthesizes Anthropic Source Code Data Leak (Fortune, March 2026), Claude Code Source Leak Analysis (Alex Kim, March 2026), Anthropic Leaked Source Code (Axios, March 2026), Claude Code Source Leak: With Great Agency Comes Great Responsibility (Straiker, March 2026), Claude Code Leaked via npm Packaging (The Hacker News, April 2026), Claude Code’s Source Code Leak (VentureBeat, March 2026), Diving into Claude Code’s Source Code (Engineer’s Codex, March 2026), and Claude Code Source Leak Privacy Nightmare (The Register, April 2026).
Victorino Group helps engineering and security teams audit the governance architecture of their agentic tools before the next leak makes audit mandatory. Let’s talk.
All articles on The Thinking Wire are written with the assistance of Anthropic's Opus LLM. Each piece goes through multi-agent research to verify facts and surface contradictions, followed by human review and approval before publication. If you find any inaccurate information or wish to contact our editorial team, please reach out at editorial@victorinollc.com . About The Thinking Wire →
If this resonates, let's talk
We help companies implement AI without losing control.
Schedule a Conversation