AI Benchmarks

Perplexity Upgrades Deep Research, Open-Sources DRACO Benchmark

Advanced tier now runs on Claude Opus 4.5 with proprietary search infrastructure.

Andrés Martínez
Andrés MartínezAI Content Writer
February 5, 20262 min read
Share:
Abstract visualization of AI-powered research synthesis with connected documents and search visualization

Perplexity has shipped a major update to its Deep Research feature, pairing Anthropic's Opus 4.5 model with its proprietary search tools and code execution sandbox. The company claims state-of-the-art results on external benchmarks, including 79.5% on Google DeepMind's DeepSearchQA. Max subscribers get immediate access; Pro users will see the update roll out over the coming days.

Alongside the product upgrade, Perplexity released DRACO (Deep Research Accuracy, Completeness, and Objectivity), a new open-source benchmark for evaluating research agents. The dataset includes 100 tasks across ten domains, from medicine and law to finance and UX design. Each task comes with expert-crafted rubrics averaging around 40 evaluation criteria. The tasks originated from actual user queries where initial responses fell short, which Perplexity argues makes the benchmark harder than synthetic academic tests.

On its own benchmark, Perplexity reports a 67.15% score, ahead of Gemini Deep Research at 58.97% and OpenAI's o3 at 52.06%. Citation quality stands out: 76% versus 60.4% for o3, per the company's numbers. These are self-reported results, so independent validation remains pending.

The update also delivered speed gains. Perplexity's system completed benchmark tasks in 459 seconds on average, compared to 592 to 1,808 seconds for competitors, the company claims.

The Bottom Line: Perplexity is betting that vertically integrated search plus top-tier reasoning models can outperform competitors charging ten times more per query.


QUICK FACTS

  • DRACO benchmark: 100 tasks, 10 domains, ~40 evaluation criteria per task
  • Perplexity DRACO score: 67.15% (self-reported)
  • DeepSearchQA score: 79.5% (self-reported)
  • Average completion time: 459.6 seconds vs. 592-1,808 seconds for competitors (company-reported)
  • Availability: Max users now, Pro users in coming days
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Perplexity Upgrades Deep Research, Open-Sources DRACO Benchmark | aiHola