Perplexity released BrowseSafe, an open research benchmark and content detection model aimed at keeping users safe as they navigate the agentic web. The release targets a growing threat: malicious instructions hidden in web page code that can hijack AI assistants performing tasks inside browsers.
Prompt injection is malicious language embedded in text that AI reads, designed to override its original intent. In the browser, agents read whole pages, so attacks can hide in places like comments, templates, or long footers. As AI assistants move from answering questions to executing actions, this attack surface becomes far more dangerous.
The fine-tuned BrowseSafe model achieves a 90.4% F1 score on the BrowseSafe-Bench test set. Large general-purpose models can reason well about these cases, but they are often too slow and expensive to run on every page. BrowseSafe scans full web pages in real time without slowing the browser. The open-weight model runs locally and flags threats before they reach an agent's decision-making layer.
Perplexity also released BrowseSafe-Bench alongside the model. The benchmark comprises 14,719 samples constructed across 11 attack types, 9 injection strategies, 5 distractor types, and 3 linguistic styles. Any developer building autonomous agents can immediately harden their systems against prompt injection with no need to build safety rails from scratch.
The Bottom Line: Perplexity is open-sourcing the security toolkit it built for its Comet browser, giving any developer building AI agents a production-ready defense against web-based manipulation.
QUICK FACTS
- F1 Score: 90.4% detection accuracy (state-of-the-art)
- Benchmark Size: 14,719 attack samples across 11 attack types
- Availability: Open-source on Hugging Face
- Research Paper: arXiv:2511.20597
- Release Date: December 2, 2025




