NVIDIA Releases Nemotron 3 Nano Omni Multimodal Model

Abstract visualization of converging video, audio, image and text data streams flowing into a single processor core

NVIDIA released its Nemotron 3 Nano Omni model last week, detailed on the company's release blog. The 30-billion-parameter open multimodal model handles text, images, audio and video in a single inference pass, with only 3 billion parameters activating per token through a hybrid Mamba-Transformer mixture-of-experts design.

Throughput is up to 9x higher than other open omni models at comparable interactivity, per NVIDIA's measurements. That number is self-reported. The more concrete win sits on OSWorld, the GUI navigation benchmark, where Nano Omni scores 47.4 against 11.1 for the previous Nemotron Nano V2 VL.

Architecture details from the technical report: 23 Mamba-2 layers, 23 MoE layers with 128 experts, and 6 grouped-query attention layers. Vision runs through the C-RADIOv4-H encoder. Audio uses Parakeet-TDT. English only at launch.

Training data sourcing is unusually candid. The model card lists Qwen3-VL, Qwen3.5, Qwen2.5-VL and OpenAI's gpt-oss-120b as sources for synthetic captions and reasoning traces. NVIDIA reports the broader Nemotron 3 family hit over 50 million downloads in the past year.

Three weight formats are live: BF16, FP8 and NVFP4. The model also runs on vLLM, SGLang, llama.cpp and Ollama, and ships as a NIM microservice. Commercial use is permitted under the NVIDIA Open Model License.

Bottom Line

Nano Omni jumps from 11.1 to 47.4 on the OSWorld GUI navigation benchmark versus the previous Nemotron Nano V2 VL.

Quick Facts

30 billion total parameters, 3 billion active per token
OSWorld GUI score: 47.4 vs 11.1 for Nemotron Nano V2 VL
9x throughput claim is NVIDIA-reported, not independently verified
Three weight formats released: BF16, FP8, NVFP4
License: NVIDIA Open Model License (commercial use permitted)

Tags:NVIDIANemotronopen source AImultimodal AIAI agentsMixture of Experts

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

NVIDIA Releases Nemotron 3 Nano Omni Multimodal Model

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

Warp Open-Sources Its Terminal Client, but Wants AI Agents to Do the Coding

Stripe Hands AI Agents a Link Wallet With Approval-Gated Spending

Bloomberg's ASKB AI Reaches 125,000 Terminal Users in Beta

Stay Ahead of the AI Curve