LLMs & Foundation Models

Sakana AI Opens Beta for Fugu, a Multi-Model Orchestration API

Sakana's second commercial product routes tasks across GPT, Claude, and Gemini through one OpenAI-compatible endpoint.

Oliver Senti
Oliver SentiSenior AI Editor
April 25, 20263 min read
Share:
Abstract visualization of multiple AI model nodes connected through a central orchestration hub

Tokyo lab Sakana AI opened beta applications on April 24 for Sakana Fugu, a multi-agent orchestration system that routes tasks across frontier foundation models from OpenAI, Google, and Anthropic. It's the second commercial product the lab has shipped in roughly three weeks, following the enterprise research agent Marlin in early April.

How it actually works

Fugu is, in Sakana's framing, a small language model that learns to call other LLMs. Instead of hand-coding which model handles math, which handles code, and which handles the chain-of-thought stitching, Fugu learns to assemble the team and divide the work. The blog post calls it their "internal secret weapon" that handled engineering and research tasks before they decided to charge for it.

The API is OpenAI-format compatible, so anyone already wired into GPT, Claude, or Gemini can swap in a Fugu endpoint. Two tiers: Fugu Mini for latency, Fugu Ultra for the harder stuff. Sakana hasn't published per-token pricing, but says you pay them, not the underlying providers. The orchestration arithmetic, they claim, ends up "tens of times cheaper" than running the models yourself. That number is worth testing rather than taking on faith.

The benchmark, with an asterisk

Fugu Ultra hits 54.2 on SWE-Pro, 95.1 on GPQA-D, and 93.2 on LiveCodeBench v6, edging out Opus 4.6, Gemini 3.1, and GPT 5.4 on each. The margins are narrow, especially on SWE-Pro.

A footnote in Sakana's table deserves a closer look. The Opus 4.6 SWE-Pro score of 53.4 is, per Sakana, Anthropic's self-reported number from a custom scaffold; Sakana ran its own evaluations on the mini-swe-agent scaffold but kept getting timeouts on Opus, so they imported Anthropic's figure instead. Not an unreasonable workaround. It also means the headline "Fugu beats Opus" comparison is between two slightly different evaluation setups, which is the kind of detail benchmark tables tend to absorb.

Recursion as a compute axis

Forget the leaderboard for a moment. The technically interesting part is that Fugu can call itself.

According to Trinity and Conductor, the two ICLR 2026 papers Fugu is built on, the system can recursively read its own output, decide its first coordination strategy was bad, and spin up a corrective workflow. Recursion depth becomes a knob you turn at inference time, no retraining required. A small router model, by reading what it just produced, reaches answers neither it nor any single worker could have produced in one pass.

Whether this matters in production is a different question. Recursive scaling means recursive cost, and the "tens of times cheaper" pitch lives or dies on how often Ultra has to recurse to actually beat the underlying models.

The beta application form is open now. Sakana says it wants testers using coding assistants like OpenCode and Codex, plus engineering and business projects where the multi-model angle might pay off.

Tags:sakana aifuguai orchestrationmulti-agent systemsllm routericlr 2026ai benchmarksfrontier modelsai agents
Oliver Senti

Oliver Senti

Senior AI Editor

Former software engineer turned tech writer, Oliver has spent the last five years tracking the AI landscape. He brings a practitioner's eye to the hype cycles and genuine innovations defining the field, helping readers separate signal from noise.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Sakana AI Opens Fugu Beta: Multi-Model Orchestration API | aiHola