Perplexity has shipped a major update to its Deep Research feature, pairing Anthropic's Opus 4.5 model with its proprietary search tools and code execution sandbox. The company claims state-of-the-art results on external benchmarks, including 79.5% on Google DeepMind's DeepSearchQA. Max subscribers get immediate access; Pro users will see the update roll out over the coming days.
Alongside the product upgrade, Perplexity released DRACO (Deep Research Accuracy, Completeness, and Objectivity), a new open-source benchmark for evaluating research agents. The dataset includes 100 tasks across ten domains, from medicine and law to finance and UX design. Each task comes with expert-crafted rubrics averaging around 40 evaluation criteria. The tasks originated from actual user queries where initial responses fell short, which Perplexity argues makes the benchmark harder than synthetic academic tests.
On its own benchmark, Perplexity reports a 67.15% score, ahead of Gemini Deep Research at 58.97% and OpenAI's o3 at 52.06%. Citation quality stands out: 76% versus 60.4% for o3, per the company's numbers. These are self-reported results, so independent validation remains pending.
The update also delivered speed gains. Perplexity's system completed benchmark tasks in 459 seconds on average, compared to 592 to 1,808 seconds for competitors, the company claims.
The Bottom Line: Perplexity is betting that vertically integrated search plus top-tier reasoning models can outperform competitors charging ten times more per query.
QUICK FACTS
- DRACO benchmark: 100 tasks, 10 domains, ~40 evaluation criteria per task
- Perplexity DRACO score: 67.15% (self-reported)
- DeepSearchQA score: 79.5% (self-reported)
- Average completion time: 459.6 seconds vs. 592-1,808 seconds for competitors (company-reported)
- Availability: Max users now, Pro users in coming days




