spoonai
GitHubSecurityAgentPentesting

Shannon Lite — an autonomous AI pentester that runs real exploits

Keygraph's open pentester reads source code white-box and fires real payloads — SQLi, XSS, SSRF, auth bypass with PoCs. 96.15% on the XBOW benchmark.

·3분 소요·GitHubGitHub
공유
Shannon Lite XBOW benchmark chart — 96.15% (100/104 challenges)
Source: GitHub (KeygraphHQ)

96.15%

Shannon Lite passed 100 of 104 challenges on XBOW — a 96.15% score that sets a new SOTA in autonomous pentesting. Released by Keygraph under AGPL-3.0, the OSS variant collected 1,400 stars in 24 hours and rolled to 31K cumulative, taking the GitHub trending top spot.

The pitch: a full pentest in 30 minutes to 1.5 hours, averaging $50 in API spend.

Why Keygraph built it

Existing pentest tools split into two camps. On one side, interactive tools driven by humans — Burp Suite, OWASP ZAP. On the other, signature-based scanners like Nuclei. The first is expensive and slow; the second lacks payload diversity.

Shannon plants an LLM in between. White-box (source-access) mode reads the codebase, infers attack vectors, and fires actual payloads to produce PoCs. Not simulated — executed.

[IMG#1]

The four-stage pipeline

recon → parallel analysis → parallel exploit → report
  1. Recon — auto-map target routes, auth flows, external API calls.
  2. Parallel analysis — multiple LLM instances scan code patterns concurrently.
  3. Parallel exploit — fires real payloads in an isolated environment.
  4. Report — packages successful exploits with PoC video, payload, and reproduction.

Tech stack

  • Language: TypeScript / Node.js
  • Bundler: tsdown (ESM)
  • Isolation: Docker worker image (~1GB)
  • LLM: Claude-API-tuned
  • Run: single npx @keygraph/shannon or docker pull

Claude-tuning is the key design call. Other LLMs work, but exploit-code generation and tool-call accuracy are most stable on Claude 4.5 Opus per the README.

Repo comparison

Repo Stars License Position
KeygraphHQ/shannon 31K AGPL-3.0 Autonomous AI pentester, white-box
xbow-engineering/xbow 18K Apache-2.0 Autonomous pentester, owns benchmark
ProjectDiscovery/nuclei 21K MIT Signature scanner
OWASP ZAP 13K Apache-2.0 Interactive + auto scanner

Shannon now leads the "AI autonomous pentester" category by stars. SOTA on XBOW's own benchmark adds credibility.

[IMG#2]

Why now — ecosystem context

Four converging trends. (1) Claude 4.5 Opus crossed the threshold for white-box exploit reasoning. (2) MCP makes plugging vuln databases and security tools into LLMs straightforward. (3) Commercial autonomous pentest services (XBOW, PortSwigger AI) validated demand for OSS. (4) AGPL-strong copyleft blocks "SaaS-rebrand" companies, building trust in the OSS edition.

Top Hacker News comment: "Could be the next Recon-ng/Nmap baseline."

Getting started

# Node 18+ required
npx @keygraph/shannon

# or Docker
docker pull keygraphhq/shannon-worker
docker run keygraphhq/shannon-worker --target https://your.app

Common gotchas — set ANTHROPIC_API_KEY first. No free-tier path; expect ~$50 per run. Only target assets you own or have explicit written authorization to test.

Limits and outlook

Two current limits. (1) Roughly 8% false positives in real-world deployments — better than Burp's ~12% but not zero. (2) AGPL friction for in-product embedding — Shannon Pro (SaaS) is the alternative.

Outlook — next six months likely brings (a) non-Claude LLM adapters, (b) better black-box (no source) mode, (c) MCP integrations into SIEM and ticketing. The team has a defensive (blue-team) variant on the roadmap.

[IMG#3]

3-Line Summary

  • Shannon Lite hit XBOW 96.15% — new autonomous-pentest SOTA, 1,400 stars day one.
  • Four-stage pipeline (recon, analysis, exploit, report) wraps a full pentest in 30-90 min.
  • AGPL keeps SaaS rebranders out; Pro SaaS plus OSS sustains the business model.

References

관련 기사

무료 뉴스레터

AI 트렌드를 앞서가세요

매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.

매일 30개+ 소스 분석 · 한국어/영어 이중 언어광고 없음 · 1-클릭 해지