NVIDIA released Nemotron 3 Super on March 11, a 120-billion-parameter open model built for multi-agent AI systems. Only 12 billion parameters activate during inference, thanks to a Mixture-of-Experts design that pairs Mamba layers (for memory efficiency) with transformer layers (for reasoning). The company claims up to 5x higher throughput and 2x better accuracy compared to the previous Nemotron Super, though those figures are self-reported against its own predecessor, not independent benchmarks.
Speed is the headline number. According to Artificial Analysis, Nemotron 3 Super hits 478 output tokens per second, outpacing OpenAI's similarly sized gpt-oss-120B at 264 tokens per second. NVIDIA calls it "the fastest model in its class," and the benchmarks back that up, at least on throughput. The model supports a 1-million-token context window and runs in NVFP4 precision on Blackwell GPUs, which cuts memory requirements versus FP8 on Hopper.
What makes this release unusual: NVIDIA is publishing the full training pipeline alongside weights. That means over 10 trillion tokens of pre- and post-training data, 15 reinforcement learning environments, and evaluation recipes. Checkpoints ship in BF16, FP8, and NVFP4 on Hugging Face. The blog post positions this as a middle tier between last December's 30B Nano and a 500B Ultra model expected later in 2026.
Enterprise partners are already lined up. Perplexity, Google Cloud Vertex AI, Oracle Cloud, and a dozen inference providers offer access from day one. The technical blog details cookbooks for vLLM and SGLang. GTC kicks off next week, and an Ultra announcement there would not be surprising.
Bottom Line
Nemotron 3 Super delivers 478 tokens per second with only 12B active parameters, making it the fastest open model in the 120B class according to Artificial Analysis.
Quick Facts
- 120B total parameters, 12B active at inference
- 478 output tokens/sec (Artificial Analysis, company-reported positioning)
- 1-million-token context window
- 10T+ pretraining tokens and 15 RL environments released openly
- Available on Hugging Face, build.nvidia.com, Perplexity, OpenRouter




