Google Research and Google Cloud rolled out a new agentic RAG framework for the Gemini Enterprise Agent Platform on June 5, 2026. The pitch: a retrieval system that does not give up after one search. It is live now as a public preview, built into Cross-Corpus Retrieval.
The problem it is chasing
Standard RAG was built for questions that live inside a single document. Ask it something where the answer is scattered across systems and it stumbles. Google's own example: ask for the specs of a server used on Project X, and a vanilla system finds the project docs, sees a server ID, then stops. It never thinks to go take that ID into another database. You get a partial answer, or a flat "not found."
The fix is to stop treating retrieval as one step. Google's research post frames the system as a research department rather than a search bar: an Orchestrator decides the request is not a one-shot job, a Planner Agent maps which databases to hit and in what order, and a Query Rewriter splits a vague question into several precise ones before a Search Fanout Agent sends them out.
The part that actually matters
Most of that exists in other multi-agent setups. The piece Google is selling as new is the Sufficient Context Agent, which the team describes as a quality-control inspector at the end of the line. It reads the retrieved snippets, drafts a rough answer, then checks whether every part of the question got answered.
If it did not, the agent does not just shrug and say it lacks information. It writes down what is missing and why, then kicks the search back into gear. In Google's hospital example, the system found a patient's medications and diet but no allergy data, so the Sufficient Context Agent told it to go look specifically for "rashes" or adverse events. Then it found them.
"This isn't a one-step job." That is how Google narrates the Orchestrator's reasoning, which is a tidy way of describing what is really just iteration with a stopping rule. The interesting bit is the stopping rule, not the agents.
So how well does it work?
Google tested on FramesQA, derived from the FRAMES paper, using 824 multi-hop queries against a corpus of 2,676 PDFs. The headline number is up to 34% higher accuracy on factuality datasets versus standard RAG, though that figure is the broad claim across datasets, not pinned specifically to the FramesQA cross-corpus run, so read it as a ceiling rather than a single result.
The cross-corpus result is more concrete and more useful. When the Planner Agent had to pick the right source from four possible datasets, three of them deliberate distractors, the system still answered 90.1% of questions correctly. That mimics the real enterprise mess where finance, HR, and ops each guard their own database. Latency stayed within about 3% of the single-corpus version, which is the kind of detail that decides whether anyone ships this.
The accuracy comparison ran through an LLM-as-a-judge against ground truth answers. Worth keeping in mind: a model grading model output is convenient, not neutral, and Google has not published the proprietary internal benchmarks it also mentions running.
For enterprise buyers the appeal is less about benchmark bragging and more about auditability. Every step leaves a trace of what was searched and why, which is the kind of thing compliance teams ask for and flashy demos never show. The public preview is available now through the Gemini Enterprise Agent Platform documentation.




