Skip to content
Back to articles
claude-codesecurityanthropicopen-source

Claude Code Source Code Leak: Everything That Was Revealed

Anthropic accidentally leaked 512K lines of Claude Code source code via npm. Source maps, secret features, KAIROS and more — here's what it all means.

9 min read

Claude Code Source Code Leak: Everything That Was Revealed

A forgotten debug file in npm exposed the internal architecture, secret features, and roadmap of Claude Code. Here's what matters.


Context

On March 31, 2026, someone at Anthropic made a mistake that became the most talked-about event in the dev community: they published a Claude Code update to npm with a file that shouldn't have been there.

The result? 512,000 lines of TypeScript source code, roughly 1,900 internal files, and an unprecedented level of access to how Claude Code actually works under the hood. The mirrored GitHub repository surpassed 84,000 stars in hours. The original post on X racked up over 21 million views.

Anthropic confirmed the incident, classified it as a human packaging error, and assured that no customer data or credentials were exposed. But the intellectual property damage was already done.


How It Happened

The culprit was a source map — a debug file with a .map extension that allows reconstructing original source code from compiled JavaScript. Developers use source maps for debugging, but they should never ship to production.

Version 2.1.88 of the @anthropic-ai/claude-code npm package included a cli.js.map file weighing 59.8 MB. Inside it, a field called sourcesContent contained the full text of every source file in the project — readable TypeScript with comments, variable names, and all.

Boris Cherny, a Claude Code engineer at Anthropic, confirmed on X that it was plain developer error, not a tooling bug. Someone forgot to add *.map to .npmignore. Three characters that would have prevented everything.

The ironic detail: a bug reported in Bun (the runtime Anthropic uses for Claude Code) on March 11, 2026 had already flagged that source maps were being served in production mode, contradicting Bun's own documentation. That bug remains open.


What Was Found Inside

The leaked code revealed not just how Claude Code works internally, but also unreleased features, competitive defense mechanisms, and architectural decisions that Anthropic clearly didn't want to make public.

KAIROS: the autonomous agent that works on its own

The most impactful discovery. Hidden behind feature flags called PROACTIVE and KAIROS, there's a fully built but unreleased autonomous agent mode.

The Claude Code everyone knows is reactive — it only acts when it receives a message. KAIROS is proactive. It runs in the background, 24/7, without human input. Every few seconds, it receives a heartbeat asking: "anything worth doing right now?"

What KAIROS can do:

  • Monitor and act independently: fix errors, update files, respond to messages
  • Push notifications: alert on phone or desktop even with the terminal closed
  • Subscribe to pull requests: automatically react to code changes on GitHub
  • Persistent memory: maintain daily logs of what it observed and did, with a nightly process called autoDream that consolidates and reorganizes memory

Close the laptop Friday, open it Monday — KAIROS has been working the whole time. The separation between initiative and execution is a significant architectural decision. It means the agent needs to decide on its own what's worth doing, requiring a fundamentally different trust model.

Anti-distillation: poisoning anyone who tries to copy

Claude Code sends requests with an anti_distillation: ['fake_tools'] flag that injects fake tool definitions into the system prompt. The goal: if a competitor is recording Claude Code's API traffic to train their own model, the training data gets contaminated.

There's a second layer called CONNECTOR_TEXT: the server summarizes the assistant's text between tool calls, returns only the summaries with cryptographic signatures, and discards the full reasoning. Anyone intercepting API traffic gets summaries, not actual reasoning chains.

In practice, both mechanisms are bypassable. A proxy that strips the anti_distillation field from requests before they reach the API disables the fake tool injection entirely. The environment variable CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS turns everything off. The real protection against distillation is likely legal, not technical.

Undercover mode: AI that hides being AI

The file undercover.ts (~90 lines) implements a mode that strips all traces of Anthropic when Claude Code is used in non-internal repositories. It instructs the model to never mention internal codenames like "Capybara" or "Tengu," internal Slack channels, repo names, or even the phrase "Claude Code" itself.

The controversial part: there's no off switch. The code has an explicit comment: "There is NO force-OFF. This guards against model codename leaks." In external builds, the entire function gets dead-code eliminated. It's a one-way door.

What this means in practice: commits and pull requests made by Anthropic employees using Claude Code on open source projects don't indicate they were written by AI. The model is instructed to write commit messages "as a human developer would." Hiding internal codenames is reasonable. Having the AI actively pretend to be human is a different matter.

DRM below the JavaScript layer

API requests include a placeholder (cch=8f037) that Bun — written in Zig — overwrites with a computed hash before sending. The server validates the hash to confirm the request came from a genuine Claude Code binary, not a third-party client.

The computation happens below the JavaScript runtime, invisible to anything running in the JS layer. It's essentially DRM for API calls. This explains why tools like OpenCode faced technical resistance (beyond just legal) when trying to use Anthropic's APIs.

Frustration detection — via regex

The file userPromptKeywords.ts contains a regular expression that detects user frustration:

/\b(wtf|wth|ffs|shit(ty)?|horrible|awful|
fucking? (broken|useless|terrible)|fuck you|
so frustrating|this sucks|damn it)\b/

An LLM company using regex for sentiment analysis is pure irony. But it also makes sense: a regex is faster and cheaper than an inference call just to check if someone is cursing at the tool.

250,000 wasted API calls per day

A comment in the code reveals that 1,279 sessions had 50+ consecutive auto-compaction failures (up to 3,272 failures in a single session), wasting ~250K API calls per day globally. The fix: cap it at 3 consecutive attempts. Three lines of code to stop burning a quarter million calls daily.

BUDDY: the April Fools virtual pet

The code contains what's almost certainly Anthropic's April Fools joke: buddy/companion.ts implements a Tamagotchi-style companion system. Every user gets a deterministic creature — 18 species, rarity tiers (common to legendary), 1% shiny chance, RPG stats like DEBUGGING and SNARK. Species names are encoded with String.fromCharCode() to escape automated code searches.


The Architecture Behind Claude Code

Beyond the secret features, the code revealed how the "harness" — the system wrapping the AI model — actually works. Several architectural decisions stood out:

Prompt cache with stable boundary: The system prompt is split into a static half (cached) and a dynamic half. This prevents Anthropic from paying full token costs every turn. Functions marked DANGEROUS_uncachedSystemPromptSection() warn engineers that changes there break the cache.

3-layer memory: It's not just "save a CLAUDE.md." The system uses an index (always loaded, ~150 chars per line), topic files (loaded on demand), and transcripts (never loaded directly — only searched via grep). Memory is treated as a hint, not as truth. The agent verifies before using it.

Bash security: Every command runs through 23 numbered security checks — 18 blocked Zsh builtins, defense against Zsh equals expansion, Unicode zero-width space injection, IFS null-byte injection, and a bypass found during HackerOne review.

Multi-agent coordination via prompt: The agent orchestrator doesn't use code to coordinate workers — it uses system prompt instructions like "Do not rubber-stamp weak work" and "You must understand findings before directing follow-up work."


Community Reaction

The reaction was massive and nearly unanimous on one point: the irony. Anthropic markets Claude Code as a powerful coding tool — and their own code leaked due to a basic packaging mistake. As one Hacker News comment put it: "accidentally shipping your source map to npm is the kind of mistake that sounds impossible until you remember that a significant portion of the codebase was probably written by the AI you are shipping."

Hacker News and Reddit: On r/LocalLLaMA, the main thread exceeded 3,700 upvotes, with focus on what the architecture reveals for building similar systems with open-weight local models. On r/ClaudeAI, one of the top posts (1,800+ upvotes) reported that, thanks to the leaked source code, a developer used OpenAI's Codex to find and patch the root cause of excessive token consumption in Claude Code.

Lightning-fast replication: Developer Sigrid Jin used OpenAI's Codex to rewrite the entire codebase from TypeScript to Python. The resulting project, claw-code, hit 50,000 GitHub stars in roughly two hours. Non-rewritten forks were DMCA'd by Anthropic, but claw-code remains up — the legal theory is that an AI clean-room rewrite doesn't violate copyright. This question has never been tested in court.

The Undercover Mode controversy: On Hacker News, the loudest criticism targeted the mode that hides AI authorship. The explicit instruction to write commits "as a human developer would" sparked debates about transparency and ethics in open source contributions.

A balanced perspective: Some developers downplayed the severity. Claude Code's CLI was always readable JavaScript (minified) in the npm package. The source map just made the original TypeScript easier to read. The scale of the leak, however, is undeniable — secret features, product roadmap, and competitive defense mechanisms all exposed at once.


The Security Context: March 2026

The leak didn't happen in isolation. In the same 30-day window:

  • Axios (100M weekly npm downloads): maintainer account hijacked, a remote access trojan deployed across macOS, Windows, and Linux. Google attributed the attack to North Korean threat actors
  • LiteLLM (95M monthly PyPI installs): backdoored with a credential harvester targeting SSH keys, AWS, GCP, Kubernetes, and LLM API keys
  • GitHub Copilot: injected promotional ads into 1.5M+ pull requests as hidden HTML comments

If you installed or updated Claude Code via npm on March 31, 2026 between 00:21 and 03:29 UTC, you may have pulled a malicious version of axios containing a RAT. Anthropic now recommends the native installer (curl -fsSL https://claude.ai/install.sh | bash) over npm.


What This Means

The code itself can be refactored. What can't be "unleaked" are the strategic details.

For competitors: 44 feature flags revealed fully built but unreleased features. KAIROS, anti-distillation mechanisms, future model codenames (Capybara v8, Numbat, Fennec) — all roadmap intel that competitors can use to anticipate moves.

For developers: The Claude Code harness is a masterclass in AI agent engineering. Prompt cache with stable boundary, layered memory, bash security with 23 checks, multi-agent coordination via prompt. Even with the code being taken down, the architectural patterns have already been absorbed by the community.

For the ecosystem: The legal question of AI clean-room rebuilds is wide open. If pointing an AI agent at a codebase and asking it to rewrite in another language doesn't violate copyright, the practical barrier to replicating any proprietary software just dropped dramatically.

For Anthropic: A company that sells AI's ability to write and review code had its own code leaked by an incomplete .npmignore. The irony stings, but the response matters more than the mistake. Anthropic adopted a blameless post-mortem, treating it as a process failure rather than an individual one. What truly counts is what changes in the release pipeline going forward.


References