Sakana AI Opens Fugu Beta: Multi-Model Orchestration API

Abstract visualization of multiple AI model nodes connected through a central orchestration hub

Tokyo lab Sakana AI opened beta applications on April 24 for Sakana Fugu, a multi-agent orchestration system that routes tasks across frontier foundation models from OpenAI, Google, and Anthropic. It's the second commercial product the lab has shipped in roughly three weeks, following the enterprise research agent Marlin in early April.

How it actually works

Fugu is, in Sakana's framing, a small language model that learns to call other LLMs. Instead of hand-coding which model handles math, which handles code, and which handles the chain-of-thought stitching, Fugu learns to assemble the team and divide the work. The blog post calls it their "internal secret weapon" that handled engineering and research tasks before they decided to charge for it.

The API is OpenAI-format compatible, so anyone already wired into GPT, Claude, or Gemini can swap in a Fugu endpoint. Two tiers: Fugu Mini for latency, Fugu Ultra for the harder stuff. Sakana hasn't published per-token pricing, but says you pay them, not the underlying providers. The orchestration arithmetic, they claim, ends up "tens of times cheaper" than running the models yourself. That number is worth testing rather than taking on faith.

The benchmark, with an asterisk

Fugu Ultra hits 54.2 on SWE-Pro, 95.1 on GPQA-D, and 93.2 on LiveCodeBench v6, edging out Opus 4.6, Gemini 3.1, and GPT 5.4 on each. The margins are narrow, especially on SWE-Pro.

A footnote in Sakana's table deserves a closer look. The Opus 4.6 SWE-Pro score of 53.4 is, per Sakana, Anthropic's self-reported number from a custom scaffold; Sakana ran its own evaluations on the mini-swe-agent scaffold but kept getting timeouts on Opus, so they imported Anthropic's figure instead. Not an unreasonable workaround. It also means the headline "Fugu beats Opus" comparison is between two slightly different evaluation setups, which is the kind of detail benchmark tables tend to absorb.

Recursion as a compute axis

Forget the leaderboard for a moment. The technically interesting part is that Fugu can call itself.

According to Trinity and Conductor, the two ICLR 2026 papers Fugu is built on, the system can recursively read its own output, decide its first coordination strategy was bad, and spin up a corrective workflow. Recursion depth becomes a knob you turn at inference time, no retraining required. A small router model, by reading what it just produced, reaches answers neither it nor any single worker could have produced in one pass.

Whether this matters in production is a different question. Recursive scaling means recursive cost, and the "tens of times cheaper" pitch lives or dies on how often Ultra has to recurse to actually beat the underlying models.

The beta application form is open now. Sakana says it wants testers using coding assistants like OpenCode and Codex, plus engineering and business projects where the multi-model angle might pay off.

Tags:sakana aifuguai orchestrationmulti-agent systemsllm routericlr 2026ai benchmarksfrontier modelsai agents

Oliver Senti

Senior AI Editor

Former software engineer turned tech writer, Oliver has spent the last five years tracking the AI landscape. He brings a practitioner's eye to the hype cycles and genuine innovations defining the field, helping readers separate signal from noise.

Sakana AI Opens Beta for Fugu, a Multi-Model Orchestration API

How it actually works

The benchmark, with an asterisk

Recursion as a compute axis

Oliver Senti

Related Articles

Why a Global AI Pause Isn't Happening: US Calls It a National Security Asset

Microsoft Launches MAI-Transcribe-1.5 Speech Model

Microsoft Unveils MAI-Thinking-1, Its First Reasoning Model

Stay Ahead of the AI Curve