Nvidia Nemotron 3 Ultra Tops US Open Models, Trails China

Abstract visualization of a large neural network with a small subset of nodes illuminated, representing sparse mixture-of-experts activation

Jensen Huang stood on the Computex stage in Taipei on June 1, leather jacket and all, and announced Nemotron 3 Ultra: a 550-billion-parameter open-weight model that Nvidia says is the smartest open model built in America. The catch, which Nvidia's own benchmarking partner makes plain, is that America's smartest open model still loses to China's.

The numbers, and who's counting them

Nemotron 3 Ultra scores 48 on the Artificial Analysis Intelligence Index. That puts it comfortably ahead of the next US open models in line, Gemma 4 31B at 39, Nvidia's own mid-range Nemotron 3 Super at 36, and gpt-oss-120b at 33. Worth remembering that Artificial Analysis ran these evaluations in partnership with Nvidia, so treat the framing accordingly, though the index itself is independent and the score lands where it lands.

Then there's Kimi K2.6 from Moonshot, sitting at 54. Six points may not sound like much. It is. Kimi was released back in April and ranks fourth among all models, open or closed, three points behind the frontier set by Anthropic, Google, and OpenAI. The current closed-model leader, Anthropic's Opus 4.8, scores 61. So the gap Nvidia narrowed is the US-China open-weights gap, not the gap to the actual frontier.

Where it actually wins

Speed. On a pre-release DeepInfra endpoint, Ultra served over 300 tokens per second. The Chinese models in its intelligence class, DeepSeek and Kimi, run at 50 to 100 tokens per second through their commercial APIs today. That is a three-to-six-times difference, and for autonomous agents grinding through long multi-step tasks where every step's latency stacks up, it is the kind of difference that shows up in a bill.

The model pulls this off with a hybrid Mamba-Transformer mixture-of-experts design. Of those 550 billion parameters, only 55 billion fire on any given token, roughly a tenth, which is the whole point of MoE and the reason a model this large can run cheaply. Nvidia pairs that with a million-token context window and 4-bit NVFP4 training. The Nemotron 3 family debuted last December with the smaller Nano variant; Super arrived in March at 120 billion parameters.

Nvidia's own slides tell a less triumphant story than the keynote. Ultra wins on instruction following, professional tasks, and long context. It trails on coding and long-horizon planning, the exact areas where Kimi K2.6 and GLM 5.1 still hold an edge. For a model pitched at agentic workflows, losing on long-horizon planning is not a footnote.

One asterisk on "available"

What ships first is a base checkpoint, not a finished assistant. It has not been instruction-tuned or aligned, which means you cannot drop it into production and expect it to behave. Nvidia calls it the best starting point for fine-tuning. A fully post-trained version is expected to follow, though Nvidia hasn't said when.

The download lands June 4 on Hugging Face, ModelScope, OpenRouter, and build.nvidia.com as an NVIDIA NIM microservice. Don't expect to run it on your laptop. A 550-billion-parameter model wants datacenter GPUs, ideally Nvidia's, which is rather the point of the whole exercise.

Tags:NvidiaNemotron 3 Ultraopen weightsAI modelsmixture of expertsArtificial AnalysisKimi K2.6Computex 2026agentic AI

Oliver Senti

Senior AI Editor

Former software engineer turned tech writer, Oliver has spent the last five years tracking the AI landscape. He brings a practitioner's eye to the hype cycles and genuine innovations defining the field, helping readers separate signal from noise.

Nvidia's Nemotron 3 Ultra Tops US Open Models, Still Trails China

The numbers, and who's counting them

Where it actually wins

One asterisk on "available"

Oliver Senti

Related Articles

NVIDIA Backs Linux Foundation's New OpenMDW License for Its Open AI Models

Alibaba's Qwen Team Releases Qwen-VLA for Cross-Robot Control

StepFun Releases Open-Weight Step 3.7 Flash for Agentic Work

Stay Ahead of the AI Curve