Anthropic has released a free Claude Code plugin called Security Guidance that flags risky code the instant Claude writes or edits a file. It is live in the Claude Code plugin marketplace now, works across every plan, and installs with a single command. The pitch is simple: catch the obvious security mistakes before they ever reach a pull request.
So what does it actually do?
It runs as a pre-tool hook. Every time Claude attempts a Write, Edit, or MultiEdit operation, the plugin scans the change first and throws up a warning with remediation advice before the edit lands. Warnings are session-scoped, so you see each one once and don't get nagged into oblivion. By the company's own description it covers eight vulnerability categories: command injection in GitHub Actions workflows, unsafe child_process.exec() calls, eval() and new Function(), the usual XSS suspects like innerHTML and dangerouslySetInnerHTML, Python pickle deserialization, and os.system(). Spot a child_process.exec() and it nudges you toward execFileNoThrow() instead.
Here's the part nobody's leading with: this is regex. Pattern matching. The same approach security tooling has leaned on for decades, now bolted onto an AI coding agent. No model call on that first pass, which is why it's instant and costs nothing. Granted, that's also its ceiling. And a regex hook catches what someone already wrote a rule for, nothing more.
The 30-40% number
Anthropic says internal testing showed security comments on pull requests dropping 30 to 40 percent after teams switched the plugin on. Sounds great until you poke at it. It's the company measuring its own product, against its own pre-plugin baseline, with no published methodology I could find. "Security comments on PRs" is a proxy, not a count of real vulnerabilities prevented. Fewer reviewer comments could mean cleaner code. It could also mean reviewers got tired of repeating themselves. I'm not saying the number is wrong. I'm saying it's a self-reported metric doing a lot of work in the announcement, and you'd want independent figures before betting a security program on it.
The boring vulnerabilities are exactly the ones that slip through, because everyone's sick of flagging them.
Three layers, two of them smarter
The plugin doesn't stop at the regex pass. There's a layered design: the zero-cost deterministic scan on every file write, then deeper model-backed review of generated output, and a commit-stage check that looks at surrounding context. You can feed it a plain-language threat model through a .claude/claude-security-guidance.md file and add up to 50 of your own pattern rules in a YAML config. The built-in patterns can't be switched off, which is the right call. Nobody should be able to quietly disable the baseline.
Why ship a dumb scanner inside a smart tool?
Because AI writes insecure code, and Anthropic knows it better than most. The heavier sibling here is Claude Security, the reasoning-based system that reads a codebase the way a human researcher would and has, per Anthropic, surfaced more than 500 previously unknown vulnerabilities in widely used open-source software. The deeper pull-request machinery lives in an open-sourced security review repo, which defaults to Opus 4.1 and does diff-aware semantic analysis. The new plugin is the cheap front line. Catch the dumb stuff with regex, save the expensive model reasoning for the subtle data-flow bugs.
There's an uncomfortable backdrop to all this. Anthropic has acknowledged that attackers, including state-sponsored ones, have used its models to hunt for exploitable bugs. When the broader Claude Code Security effort was announced, security stocks twitched: CrowdStrike and Cloudflare both closed down around 8% that session, per SiliconANGLE. But whether a free regex hook moves that needle is a separate question.
Forget the 30-40% figure for a second. The number that'll actually matter is retention: do developers leave it on after the novelty wears off, or mute it like every other linter that cried wolf one time too many?




