Google Deep Research Max adds MCP support, native charts

Google on Tuesday launched two new agents in the Gemini API, Deep Research and Deep Research Max, both running on Gemini 3.1 Pro. The bigger release is Max, a test-time-compute variant aimed at overnight analyst workflows. The headline change across both is Model Context Protocol support, which for the first time lets the agents reach into proprietary enterprise data.

Sundar Pichai announced it on X. The developer docs went live the same day.

The MCP thing is what matters

Strip away the marketing and MCP is the actual news here. Until Tuesday, Deep Research could search the open web. That's useful. But if you're a hedge fund analyst, your workflow starts with FactSet terminals, PitchBook, internal CRMs, data universes the public web can't see. Google says it's already working with FactSet, S&P Global, and PitchBook on MCP server designs.

That's a pointed list of partners. Financial services is where autonomous research agents either prove out or die in procurement, and those three names cover the dominant institutional data pipes. If Google can make MCP integration work cleanly there, the rest of the API story matters less.

The agent supports arbitrary MCP servers, tool definitions, and multimodal inputs (PDFs, CSVs, images, audio, video). Web access can be turned off entirely for fully-internal research. None of that breaks new ground. It is table stakes if you want enterprise customers.

About those benchmarks

Pichai's post cited 93.3% on DeepSearchQA and 54.6% on HLE, numbers that sound impressive until you start comparing them to what other labs publish.

The Decoder's Matthias Bastian flagged the issue quickly. Google's chart compares Deep Research Max against OpenAI's GPT-5.4 and Anthropic's Opus 4.6, but GPT-5.4 isn't OpenAI's strongest search-optimized model. That would be GPT-5.4 Pro, which hits 89.3% on BrowseComp according to OpenAI's own numbers. Standard GPT-5.4 lands at 82.7%. Google didn't include Pro in the comparison.

Anthropic's reported BrowseComp score for Opus 4.6 is 84%, higher than what Google's chart shows. The gap, per Anthropic, comes from running with reasoning turned off, which performs better on that benchmark than the high-reasoning intensity Google tested at.

Are Google's numbers wrong? Not exactly. Apples-to-apples benchmark comparisons between labs are notoriously hard because methodology drifts between raw API testing and whatever scaffolding a company wraps around its own model. But when your chart omits the strongest competitor configuration and tests others at suboptimal settings, it stops being measurement and starts being marketing.

Two agents, two use cases

Deep Research replaces the December preview. It is faster, cheaper, and aimed at real-time surfaces where a user is waiting on an answer. Standard chat latency, essentially.

Max is the interesting one. Google explicitly frames it for asynchronous workflows: nightly cron jobs that drop due diligence reports in analysts' inboxes by morning. Extended test-time compute means the agent iteratively reasons, searches, and refines its final output. You're paying for more tokens and more time to get a more thorough report.

Whether "more thorough" translates to "more correct" is the question regulated industries are going to spend the next six months trying to answer. Google says Max consults more sources than the December release and catches nuances the older version missed, per a blog post from DeepMind product managers Lukas Haas and Srinivas Tadepalli. No external evaluation yet. I haven't seen one.

Native charts, sort of

The other first for this release is in-line chart and infographic generation. Deep Research can now render visualizations directly in reports using HTML or Nano Banana, Google's image generation model. No external libraries, no separate rendering pipeline.

This is actually clever. Enterprise research reports get dropped into slide decks, client emails, and board materials. If the agent produces something already presentation-ready, the last-mile formatting work disappears. Google's examples show charts on fiat currency performance, European fintech capital allocation, and global energy trade flows, the kind of polished outputs you'd see in a sell-side research note.

The question is whether the charts are right. Generated visualizations are notorious for subtle label errors, axis miscalibrations, and numbers that don't match the underlying data. Google's blog post doesn't explain how it validates chart accuracy. And "Nano Banana rendered a chart from a CSV" isn't a claim I would stake an analyst's job on without independent verification.

What's next

Both agents are in public preview on paid Gemini API tiers as of Tuesday. Google Cloud availability for startups and enterprises is coming "soon," though the company hasn't committed to a date. The December release was also called a preview, and stayed that way for months.

Google hasn't published per-request pricing for Max, just that the test-time compute variant costs more than the standard version. That's the part I would want to see before committing to nightly cron workflows.