Enterprise AI

Microsoft Pairs GPT and Claude Inside Copilot Researcher

New Critique feature uses GPT for drafting and Claude for review, claiming 13.8% accuracy gains.

Andrés Martínez
Andrés MartínezAI Content Writer
March 31, 20262 min read
Share:
Split-screen showing two AI models collaborating on a research document, one generating text while the other reviews with highlighted corrections

Microsoft is making GPT and Claude work together inside a single research workflow. The company's Researcher agent in Microsoft 365 Copilot now ships with a feature called Critique: GPT drafts the research report, then Claude reviews it for accuracy, completeness, and citation quality before the user sees anything.

Critique becomes the default when users select "Auto" in the model picker. Microsoft says the setup scored 13.88% higher on the DRACO benchmark (100 research tasks across 10 domains) than Perplexity Deep Research running Claude Opus 4.6, which was the previous top performer. Those are Microsoft's own numbers using the benchmark's evaluation protocol, so independent confirmation is pending. Jared Spataro, Microsoft's CMO for AI at Work, said the workflow is currently one-directional but "we expect this workflow to be bidirectional in the future."

A second addition, Model Council, runs multiple models on the same prompt simultaneously and displays their outputs side by side, flagging where they agree and diverge. Both features are available now through Microsoft's Frontier program.

Microsoft also expanded access to Copilot Cowork, a Claude-powered agent for delegating multi-step tasks like calendar scheduling, file management, and daily briefings. The broader context: Microsoft reported 15 million paid Copilot seats in January, roughly 3.3% of its 450 million commercial Microsoft 365 users. Features like Critique look designed to close that gap by addressing trust, the persistent objection to AI-assisted research in enterprise settings.


Bottom Line

Microsoft's Researcher agent now uses GPT to draft and Claude to review research reports, with the dual-model setup scoring 13.88% higher on the DRACO benchmark than single-model competitors.

Quick Facts

  • Critique: GPT drafts, Claude reviews for accuracy and citations
  • 13.88% improvement on DRACO benchmark over Perplexity Deep Research (company-reported)
  • +7.0 point gain on aggregated DRACO score
  • 15 million paid Copilot seats as of January 2026 (3.3% of 450M commercial users)
  • Copilot Cowork (Claude-powered) now available via Frontier program
Tags:Microsoft 365 Copilotmulti-model AIOpenAIAnthropicClaudeenterprise AICopilot Cowork
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Microsoft Copilot Pairs GPT and Claude in New Critique Featu | aiHola