Shannon Brings Autonomous AI Pentesting to GitHub's Trending Page

An open-source AI pentester called Shannon hit GitHub's trending page this week, pulling in thousands of stars with a pitch that resonates with anyone who's shipped code faster than their security team can review it. Built by Keygraph, the tool claims to autonomously discover and exploit web application vulnerabilities, delivering proof-of-concept attacks rather than the usual wall of theoretical warnings.

The GitHub repo frames the problem bluntly: your team ships code daily, your pentest happens annually. That's 364 days of security gap. Shannon positions itself as the fix.

What it actually does

Shannon runs whitebox security audits against web applications with access to source code. It reads your codebase, maps attack surfaces, then launches real exploits through a built-in browser to confirm vulnerabilities are exploitable. SQL injection, XSS, SSRF, broken authentication: the standard OWASP categories, with more reportedly in development.

The multi-agent architecture parallelizes reconnaissance and exploitation phases, which means it's not just running one attack chain at a time. Under the hood, it relies on Anthropic's Claude models, either through an API key or Claude Code OAuth. The whole pipeline runs in Docker, one command to start, and you get a pentest report at the end.

A full run takes about 1 to 1.5 hours and costs roughly $50 in API calls using Claude Sonnet, according to Keygraph's documentation. That's cheap compared to a human pentester, but whether it catches the same things is a different question entirely.

The 96% number needs context

Shannon claims a 96.15% success rate on the XBOW benchmark, which sounds spectacular. But the project's own documentation is refreshingly honest about the caveat: Shannon ran in whitebox mode with full source code access on a cleaned, hint-free version of the benchmark. Previous results from XBOW itself and human pentesters hit around 85%, but those were blackbox tests.

"These results are not apples-to-apples," the benchmark writeup admits. Shannon had advantages that real-world attackers (and the benchmark's original testers) didn't. The XBOW benchmark consists of 104 CTF-style challenges, originally developed by third-party contractors for XBOW to test their own commercial platform. Shannon solved 100 of them.

Still, prior open-source efforts like MAPTA topped out around 77% on the same benchmark. Shannon's gap over existing open-source tools is significant, even accounting for the whitebox advantage.

Who's behind this

Keygraph positions Shannon as one piece of a broader compliance platform they describe as "Rippling for Cybersecurity," aimed at SOC 2 and HIPAA workflows. The GitHub org lists no public members, which is unusual for an open-source project courting community contributions. The repo itself is written in TypeScript and licensed under AGPL-3.0.

The project has attracted attention from notable names in the AI and security communities. According to SourcePulse tracking, its stargazers include Aravind Srinivas (Perplexity co-founder), Boris Cherny (creator of Claude Code at Anthropic), and Lysandre Debut (Hugging Face's chief open-source officer). That's a strong signal of interest, though starring a repo and depending on it in production are very different things.

So should you run it?

Shannon comes with a blunt warning: this is not a passive scanner. The exploitation agents execute real attacks. Data gets modified. If you point this at production, you're going to have a bad day. The project recommends isolated test environments, and that advice is worth taking literally.

The Lite version on GitHub covers injection, XSS, SSRF, and authentication bypass. Shannon Pro, their commercial offering, adds LLM-powered data flow analysis inspired by the LLMDFA paper and CI/CD integration for teams that want continuous security testing in their deployment pipeline.

There's also an experimental router mode for running against OpenAI or Google Gemini via OpenRouter, though Keygraph marks it as unsupported. The primary path is Anthropic's stack.

The real test for Shannon won't be benchmark scores. It'll be whether teams actually integrate it into their development workflows and whether the findings hold up against the messy, non-CTF reality of production web applications. Keygraph says CI/CD integration and expanded vulnerability coverage are on their roadmap. The repo was last updated February 9, 2026.

Shannon Brings Autonomous AI Pentesting to GitHub's Trending Page

What it actually does

The 96% number needs context

Who's behind this

So should you run it?

Oliver Senti

Related Articles

Epoch AI Data Links CVE Surge to Anthropic and OpenAI Cyber Models

Alibaba Bans Claude Code, Orders Staff to Delete Anthropic Models by July 10

US Clears Anthropic to Restore Mythos 5 for About 100 Organizations

Stay Ahead of the AI Curve