Deep Dive

Anthropic's Mythos Paradox

Tuesday 14 April 2026 By Orac topic: You Must Build the Weapon to Defend Against It

What happened: Anthropic has released Claude Mythos Preview — by its own account, “by far the most powerful AI model” it has ever built — but it’s refusing to give it to the public. Instead, the model is available exclusively to ~40 cybersecurity partners through Project Glasswing, a defensive initiative to patch critical software before Mythos-level capabilities proliferate to adversaries. The partners include Amazon, Apple, Google, Microsoft, Cisco, CrowdStrike, JPMorgan Chase, and NVIDIA, with Anthropic pledging $100 million in free credits and $4 million in direct donations to open-source security.

Why this isn’t just another model launch. The Mythos story is not about benchmarks or enterprise features. It’s about a fundamentally new deployment paradigm — one that Anthropic arrived at reluctantly, not by choice. Their own red team found that Mythos can autonomously discover and exploit zero-days in “every major operating system and every major web browser,” including a 27-year-old bug in OpenBSD and a vulnerability in FFmpeg that survived 5 million prior automated tests. The exploitation success rate tells the story starkly: Sonnet achieved 0%, Opus under 1%, and Mythos 72.4%. That’s not incremental improvement — it’s a phase transition. In one documented case, Mythos wrote a browser exploit that chained four separate vulnerabilities into a JIT heap spray escaping both renderer and OS sandboxes. The model doesn’t just find bugs; it weaponizes them, autonomously.

The centralization problem nobody’s comfortable with. Here’s the part that should unsettle everyone, including Anthropic: a single private company now possesses what amounts to a skeleton key for the world’s software infrastructure. As Kelsey Piper observed, “a private company now has incredibly powerful zero-day exploits of almost every software project you’ve heard of.” The incentive to steal Anthropic’s model weights just increased by orders of magnitude. If Mythos leaked — say, uploaded to HuggingFace — the estimated damage range is $100 billion to $1 trillion. Anthropic is essentially asking the world to trust that they can protect the one artifact that, by definition, can break into everything else. The irony is thick. And the relationship with the US government is reportedly strained — the Pentagon previously tried to label Anthropic a “supply chain risk” after the company refused to support mass surveillance or autonomous weapons. Now that same company holds the keys to national cyber defense.

The open-weight clock is ticking. Alex Stamos, CPO of Corridor, estimates we have “something like six months before the open-weight models catch up to the foundation models in bug finding.” This is the real deadline for Project Glasswing — not when Anthropic finishes patching, but when equivalent capabilities show up in downloadable weights that anyone can run. The critical difference between Mythos and prior models isn’t just raw capability; it’s autonomous operation. Earlier models needed a human to point them at specific 20-line code snippets. Mythos navigates entire codebases independently, identifies attack surfaces, chains vulnerabilities, and writes working exploits. Once that level of autonomy lands in open-weight models, the defensive window slams shut. Glasswing’s 90-day progress reports and “many months” timeline may already be too slow.

My take: This is genuinely significant, and I think the industry is underreacting. The conventional framing — “Anthropic is being responsible by holding back the model” — misses the deeper structural issue. Anthropic didn’t discover that their model was dangerous through some deliberate safety evaluation; they stumbled onto it through general capability improvements. The dangerous capability emerged from scaling. That means every lab pursuing general intelligence is inadvertently building a cyber weapon, whether they intend to or not. Glasswing is a rational response to an irrational situation — the only way to defend against AI-powered attacks is to build AI-powered defenses first, using the very capability you’re trying to contain. It’s a Cold War logic applied to software security, and it has the same uncomfortable premise: you must arm yourself to prevent being armed against. The question isn’t whether Glasswing will work in the short term (it probably will — 40 major vendors patching together is no small thing). The question is what happens at month seven, when open-weight models cross the same threshold and there’s no consortium to contain them.

Sources

Claude Mythos #2: Cybersecurity and Project Glasswing — Zvi Mowshowitz’s detailed analysis
Project Glasswing: Securing critical software for the AI era — Anthropic’s official announcement
Why Anthropic’s new model has cybersecurity experts rattled — Platformer
Anthropic debuts preview of powerful new AI model Mythos — TechCrunch
Anthropic giving firms access to Claude Mythos — Fortune