Anthropic's Claude Mythos Autonomously Discovers, Exploits Zero-Days

Written by Drew Todd | Fri | Apr 10, 2026 | 1:12 PM Z

Anthropic has unveiled Claude Mythos Preview, a new AI model with cybersecurity capabilities the company's researchers are calling a watershed moment for the industry. Unlike prior models that could identify vulnerabilities but rarely exploit them, Mythos Preview autonomously discovers and weaponizes zero-day flaws—including across every major operating system and web browser—without human intervention beyond an initial prompt.

The announcement, published April 7, 2026, on Anthropic's security research blog, comes alongside the launch of Project Glasswing—a restricted defensive initiative that will give Mythos Preview access to a limited group of critical infrastructure operators and open-source developers before any broader release. Anthropic has stated it does not plan to make the model publicly available, citing the severity of its offensive capabilities.

For security practitioners, the report details findings that challenge assumptions underpinning defensive security for the past two decades—including a 27-year-old crash bug in OpenBSD, a 16-year-old flaw in FFmpeg's H.264 codec, a guest-to-host memory corruption vulnerability in a production virtual machine monitor, and thousands of additional findings still under coordinated disclosure.

A qualitative leap over prior models

Anthropic's researchers are explicit about the performance gap between Mythos Preview and its predecessors. In a Firefox 147 JavaScript engine benchmark, Claude Opus 4.6 produced working shell exploits only twice across several hundred attempts against the same vulnerability set. Mythos Preview produced 181 working exploits, with register control achieved in 29 additional cases.

The model's performance on internal benchmarks tells a similar story. Across roughly 7,000 entry points in open-source repositories from the OSS-Fuzz corpus, Opus 4.6 achieved a single tier-3 crash on a five-tier severity scale, with no higher results. Mythos Preview reached tier 5—full control-flow hijack—on 10 separate, fully patched targets.

Critically, these capabilities were not explicitly trained into the model. Anthropic's team writes that exploit proficiency emerged as a downstream consequence of broader improvements in code reasoning and agentic autonomy—the same improvements that make the model more effective at patching vulnerabilities also make it more effective at exploiting them.

"Mythos Preview signals that zero-day discovery is becoming cheaper, faster, and more scalable," said Sunil Gottumukkala, CEO of Averlon. "Researchers have already shown earlier models can help find serious vulnerabilities, but this represents a real capability jump. Even with restricted access, the broader implication is clear: we should expect more dangerous vulnerabilities to be found across major software platforms, and many organizations still don't patch fast enough to keep up."

What the model actually found

Anthropic used a consistent scaffold for all vulnerability discovery work: a containerized environment, a Claude Code instance running Mythos Preview, and a single-paragraph prompt asking the model to find a security vulnerability. From there, the model reads source code, forms hypotheses, validates them against a running target, and outputs a bug report with a proof-of-concept exploit and reproduction steps. Human involvement ends at the initial prompt.

A 27-year-old OpenBSD kernel crash

In OpenBSD's TCP SACK implementation, Mythos Preview identified a two-bug chain. The first allows the start value of a SACK block to fall outside the valid send window. The second allows that value—due to signed 32-bit integer overflow on sequence number comparisons—to simultaneously satisfy contradictory conditions, triggering a null-pointer write that crashes the kernel. The flaw dates back to OpenBSD's 1998 SACK implementation and allows a remote attacker to repeatedly crash any OpenBSD host that responds over TCP. The vulnerability has been patched. Across 1,000 scaffold runs against OpenBSD at a total cost of under $20,000, the model surfaced several dozen findings.

A 16-year-old FFmpeg codec vulnerability

In the H.264 decoder, a 32-bit slice counter is stored in a 16-bit lookup table, initialized to the 65535 sentinel value. A specially crafted frame containing exactly 65,536 slices causes the counter to collide with that sentinel, triggering an out-of-bounds write. The underlying type mismatch dates to FFmpeg's 2003 H.264 commit; the exploitable code path was introduced in a 2010 refactor. Three FFmpeg vulnerabilities identified by Mythos Preview have been patched in FFmpeg 8.1, with additional findings under coordinated disclosure.

A guest-to-host memory corruption flaw in a production VMM

Mythos Preview identified a memory corruption vulnerability in a production virtual machine monitor written in a memory-safe language. The bug exists in an unsafe code block performing direct pointer manipulation—unavoidable in VMM code that must communicate with hardware. An attacker with guest access triggers an out-of-bounds write in the host process's memory. The vulnerability remains unpatched; Anthropic is withholding the project name and technical details pending coordinated disclosure.

Of the 198 vulnerability reports reviewed so far by contracted human validators, expert assessors agreed with the model's severity rating in 89% of cases and were within 1 severity level in 98% of cases.

Autonomous exploitation: the FreeBSD ROP chain

The most detailed exploit case study in the report is CVE-2026-4747, a 17-year-old remote code execution vulnerability in FreeBSD's NFS server. Mythos Preview identified and fully exploited the flaw without any human guidance after an initial prompt.

The vulnerability is a stack buffer overflow in FreeBSD's RPCSEC_GSS authentication handler: an attacker-controlled packet is copied into a 128-byte stack buffer, with a length check that permits up to 400 bytes. Several standard mitigations do not apply—the buffer is declared as an integer array, so GCC's stack protector does not instrument it, and FreeBSD does not randomize the kernel load address, making ROP gadget locations predictable.

Rather than brute-forcing the kernel host ID required to reach the vulnerable code path, Mythos Preview found that a single unauthenticated NFSv4 EXCHANGE_ID call returns the server's UUID and NFS daemon start time—sufficient to reconstruct the required values. The model then built a 20-gadget ROP chain that writes its public SSH key to /root/.ssh/authorized_keys, split across six sequential RPC packets to fit within the per-request constraint. The result is unauthenticated root access over the network.

A prior independent research firm had demonstrated that Opus 4.6 could exploit this same vulnerability, but only with substantial human prompting and guidance. Mythos Preview required none.

Vulnerability chains and the Linux kernel

A significant portion of the report documents Mythos Preview's ability to chain multiple vulnerabilities into complete exploits—a capability previously associated with skilled human researchers. The model demonstrated this across Linux kernel targets, constructing chains involving KASLR bypasses, heap manipulation, and kernel credential replacement.

In one case, the model used a one-bit out-of-bounds write in Linux's ipset (netfilter) code to flip the write-permission bit in a page table entry. The technique requires manipulating the kernel's per-CPU page allocator to place a kmalloc slab page physically adjacent to a page-table page in RAM, then using the OOB write to upgrade a read-only mapping of a setuid binary to writable. A 168-byte ELF stub, rewritten to use that mapping, provides root execution. Cost at API pricing: under $1,000.

A second example chains a use-after-free in Unix-domain socket out-of-band data handling (CVE-2024-47711) with a separate use-after-free in the Linux traffic-control DRR scheduler. The combined exploit builds an arbitrary kernel read primitive, defeats KASLR by reading the interrupt descriptor table, locates the kernel stack to recover a dangling pointer, and calls commit_creds() with a crafted root credential structure—navigating CONFIG_HARDENED_USERCOPY restrictions throughout. Cost: under $2,000.

Anthropic reports nearly a dozen similar examples of the model independently chaining two, three, or four vulnerabilities into functional privilege-escalation exploits in the Linux kernel.

Perspectives on the claims: a skeptical read

Not everyone in the security community accepts Anthropic's framing at face value. Steven Swift, Managing Director of Suzu Labs, offered a detailed critical assessment of the report's evidence.

"Anthropic knows what they're doing. They're making big claims, because attention is good for their business model," Swift said. "They're providing just enough detail so that their claims look convincing at first glance. But when you look closer, claims lack substance and rely on implications that all of the examples related prove their claims."

Swift specifically challenges the N-day exploit demonstrations, arguing that providing a model with detailed prior vulnerability context—including fuzzer-generated crash reports and CVE identifiers—is not equivalent to autonomous discovery. He notes that Mythos Preview was unable to produce working exploits against the Linux kernel vulnerabilities it independently found, and that generating exploit code from a well-described vulnerability is a capability that existing large language models already demonstrate.

He also raises a structural concern: because Mythos Preview is not publicly available, independent researchers cannot audit the claims. The report's evidence rests on Anthropic's own testing, with cryptographic commitments for unreleased vulnerability details offered as accountability anchors.

That critique is worth holding alongside the report's most defensible data points: the model discovered a 27-year-old zero-day in OpenBSD and a 16-year-old flaw in FFmpeg—both confirmed by AddressSanitizer and now patched—and it did so autonomously on code that had been reviewed and fuzz-tested extensively. Whatever the outer limits of the claims, those findings are concrete.

The dual-use problem at scale

"You can also look at this from another angle: try using Claude to write some code and see how many bugs, or even new zero-days, it produces," said Nick Mo, CEO & Co-founder of Ridge Security Technology Inc. "Claude Code is already making developers many times more productive than before, which means the number of potential vulnerabilities being introduced is also many times greater. It's writing code and writing vulnerabilities at the same time."

Mo's framing points to a compounding dynamic: AI-accelerated development creates more code—and therefore more surface area for vulnerabilities—while AI-accelerated security tooling is simultaneously needed to audit it. The race is between the same underlying technology deployed on offense and defense.

Noelle Murata, Sr. Security Engineer at Xcape, Inc., focused on the remediation side of the equation, noting that Project Glasswing's restricted partner program—which Anthropic describes as prioritizing critical infrastructure operators and open source maintainers—is designed to address what she calls a massive vulnerability debt now being surfaced faster than human teams can triage and patch it.

"If Project Glasswing is a 'cyber-nuke,' Anthropic is attempting to ensure the 'mutually assured destruction' of bugs happens in a controlled vacuum before it hits the production Internet," Murata said.

Implications for defenders

Anthropic's research team closes the report with a set of recommendations directed at security practitioners and software operators. The core themes, translated for operational context:

Deploy current frontier models for vulnerability discovery now. Opus 4.6 and comparable models already find high- and critical-severity bugs across OSS-Fuzz targets, web applications, cryptography libraries, and the Linux kernel. Organizations that have not adopted AI-assisted bugfinding are leaving findings on the table—and potentially leaving them for adversaries to find first.
Compress patch cycles. The N-day exploitation timeline has shortened. Organizations should tighten patching enforcement windows, enable auto-update where feasible, and treat dependency bumps carrying CVE fixes as urgent rather than routine maintenance. Out-of-band patching processes may need to become standard rather than exceptional.
Extend AI tooling beyond bug finding. Current models can triage reports, deduplicate findings, draft patch proposals, review pull requests for security issues, analyze cloud configurations, and support incident response documentation and root-cause analysis. Automation of these workflows reduces human bottlenecks as discovery volume increases.
Reassess friction-based defenses. Mitigations whose security value derives primarily from making exploitation tedious—rather than technically impossible—may be significantly weaker against model-assisted adversaries operating at scale and low cost. Hard barriers such as KASLR, W^X, and memory-safe language adoption remain valuable.
Update vulnerability disclosure policies for AI-scale discovery. Programs designed around individual researcher findings may need restructuring to manage the volume that AI-driven pipelines can generate. Anthropic itself contracted professional human validators to triage its own disclosure queue before sending reports to maintainers.

"The offensive landscape just went autonomous," said Joshua Marpet, Senior Product Security Consultant at Finite State. "We can no longer fight machine-speed threats with manual, point-in-time reviews. Defense must become as continuous and autonomous as the attacks coming our way."

Anthropic describes the current moment as a disruption of the security equilibrium that has prevailed for roughly 20 years. The company expresses confidence that AI-driven defense will eventually dominate—producing a net improvement in software security across the industry—but is direct about the difficulty of the transitional period.

Project Glasswing, the coordinated defensive initiative announced alongside Mythos Preview, will deploy the model to a restricted set of critical infrastructure operators and open source developers with the goal of hardening key systems before models with comparable capabilities become more broadly available. Anthropic says it plans to develop new cybersecurity safeguards with an upcoming Claude Opus model—testing and refining them on a system that does not carry the same risk profile as Mythos Preview—before pursuing wider deployment.

The full technical report, including cryptographic commitments for unreleased vulnerability details, is available at red.anthropic.com.

View full post