spoon aispoon ai
ResearchAnthropicCybersecurityClaude

Anthropic Unveils 'Project Glasswing' – Mythos Model Found 7 Real Zero-Days

Anthropic's Project Glasswing is a defensive cybersecurity program built on its internal Mythos model, which turned up seven real zero-days during red-team testing.

공유
Anthropic Project Glasswing announcement
Anthropic

The Hook

On April 22, Anthropic formally announced "Project Glasswing", a defensive cybersecurity program, together with an internal, security-specialized model called "Mythos". The punchline: during three months of red-team testing, Mythos found seven real zero-day vulnerabilities in live open-source projects.

Four of the seven are rated CVSS 9.0 or higher. One is a remote-code-execution bug in an infrastructure library with roughly 800 million deployments worldwide. Anthropic observed a 90-day responsible-disclosure window and timed this announcement to the day after the vendor's patch shipped. This is the first time "AI found a real zero-day" has been proven by patch timeline rather than by a research paper.

Context You Need

Since 2023, "AI for vulnerability research" has been a hot corner of security research – Google Project Naptime, DARPA AIxCC, Microsoft Security Copilot. The common ceiling: models could generate plausible-looking bug candidates, but validating that a bug was actually exploitable kept falling over.

Anthropic's approach breaks that bottleneck in three layers. First, pretraining included the entire CVE database, all of Exploit-DB, and around 400,000 curated vulnerability reports. Second, during reinforcement learning, fuzzers and debuggers were attached as tools, so every candidate bug the model produced got executed in a sandbox for PoC verification. Third, human red-teamers were paired with the model in a loop where the model proposes, humans triage, and the model refines.

It is the second layer that separates Glasswing from Project Naptime. Instead of "read code, guess a bug", Mythos is trained on the feedback of actually running things. The workflow that vulnerability researchers have always done by hand is now the training signal.

Anatomy of the Model

Mythos is a separate fine-tune branch from the public Claude lineup. The base is Claude Sonnet 4.6, but its general math and coding scores are a touch lower – it pays back that budget in security benchmarks.

Benchmark Mythos Claude Opus 4.6 GPT-5
CyberSecEval 3 84.1% 71.0% 68.3%
SecBench Exploit 62.4% 38.1% 33.9%
DARPA AIxCC Final 8/10 4/10 3/10
Human Red Team Overlap 73% 41% 35%

Human Red Team Overlap is Anthropic's own metric: the fraction of bugs that a skilled human researcher would have found that the model also finds independently. Seventy-three percent is close enough to "replaces one researcher" to matter.

The seven zero-days break down as follows.

  • 1× network library RCE (Critical)
  • 1× Linux kernel LPE (Critical)
  • 1× web framework SSRF→RCE chain (Critical)
  • 1× container runtime escape (Critical)
  • 2× parser heap overflow (High)
  • 1× authentication bypass (High)

The network library RCE is the case Anthropic leans on hardest when pitching Glasswing to enterprise customers. A conventional pentest engagement running five or six engineers for three months typically would not have found this one.

The Bigger Picture

Timing matters. Earlier this month CISA issued an advisory that attackers are starting to use AI-assisted vulnerability research tooling. Last week a Mandiant report described a Chinese APT group using an open-source model to auto-discover three vulnerabilities.

The classic asymmetry – defender has to block all bugs, attacker only needs one – is eroding. Once AI can sweep the same codebase for both sides, the game collapses to "whoever finds it first". And "finds it first" scales with inference compute.

Anthropic is selling Glasswing as a $2M/year enterprise program. For a Fortune 500 security org, that is the loaded cost of two or three researchers, and if Mythos clocks 73% human overlap, the ROI math is trivial. The real comparison is not to EDR vendors like SentinelOne or CrowdStrike – it is to pentest and bug-bounty platforms like Synack and Bugcrowd.

OpenAI and Google are clearly working the same problem. OpenAI previewed "CodeScan" at DEF CON in January; Google is rumored to be on a second generation of Project Naptime. Anthropic has now planted a flag – the next six months will be about who publishes the next real zero-day.

What Actually Changes

Three threads to watch.

First, the security model of open-source projects flips. "Maintainer audits when they have time" becomes "model audits continuously". Expect Linux Foundation and OpenSSF to ask Anthropic for some form of public-access tier. This is already showing up in the EU Cyber Resilience Act's second amendment cycle.

Second, bug-bounty pricing breaks. Critical RCEs currently pay $50K–$200K on average. If a model can do the same work at near-zero marginal cost, the payouts to individual researchers compress fast. What survives is the "exotic bug artisan" tier – kernel race conditions, hardware bugs, side channels – where Mythos is still weak.

Third, from an AI safety angle this is a yellow flag. Mythos is defensive in marketing, but the same capability is dual-use. Anthropic explicitly does not expose Mythos through the API and instead sells it as a service. The subtext: if weights leak, the board flips. Hence the explicit reference in Anthropic's post to "the ASL-4 threshold in our Responsible Scaling Policy".

If this week's Meta MCI employee surveillance program is the extreme of data collection, Glasswing is the extreme of capability concentration. Both are downstream of the same capex race.

References

관련 기사

무료 뉴스레터

AI 트렌드를 앞서가세요

매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.

매일 30개+ 소스 분석 · 한국어/영어 이중 언어광고 없음 · 1-클릭 해지