AI Models Platforms

Mistral Launches Small 4, a 119B MoE Model Under Apache 2.0

Mistral's new 119B-parameter open-source model unifies instruct, reasoning, and multimodal capabilities.

Andrés Martínez
Andrés MartínezAI Content Writer
March 17, 20262 min read
Share:
Abstract visualization of a neural network with branching expert pathways converging into a single output stream

Mistral released Small 4 on Sunday, a 119-billion-parameter mixture-of-experts model that merges what were previously three separate product lines into one. Instruct, reasoning (formerly Magistral), and agentic coding (Devstral) now live in a single checkpoint, available under Apache 2.0.

The model activates just 6 billion parameters per token across 128 experts (4 active per forward pass), accepts text and image inputs, and supports a 256k context window. A configurable reasoning_effort parameter lets developers toggle between fast chat-style responses and deeper step-by-step reasoning at request time, eliminating the need to route between separate models. Weights are on Hugging Face in FP8, with an NVFP4 quantized checkpoint and a trained eagle head for speculative decoding. No base model was published.

Mistral claims 40% lower latency and 3x throughput gains over Mistral Small 3, though these are company-reported numbers without independent verification. On its own selected benchmarks, the company says Small 4 matches or beats GPT-OSS 120B while producing shorter outputs. The efficiency angle is the real pitch here: on AA LCR, Small 4 reportedly hits 0.72 in 1.6K characters of output where comparable Qwen models need 3.5-4x more text for similar scores.

The architecture uses MLA (Multi-Head Latent Attention), the same DeepSeek V3-derived approach Mistral adopted for Large 3 in late 2025. Minimum hardware is 4x H100 or 2x H200 GPUs. The model is also available day-zero as an NVIDIA NIM container, with support already in vLLM, llama.cpp, and SGLang.


Bottom Line

Mistral Small 4 consolidates three model families into one 119B Apache 2.0 checkpoint with 6B active parameters, configurable reasoning, and multimodal input.

Quick Facts

  • 119B total parameters, 6B active per token (128 experts, 4 active)
  • 256k token context window, text and image input
  • Apache 2.0 license, no base model released
  • FP8 and NVFP4 weights available, plus eagle head for speculative decoding
  • Company-reported: 40% latency reduction and 3x throughput vs. Mistral Small 3
Tags:Mistral AIopen-source AImixture-of-expertsmultimodal AIApache 2.0reasoning modelsMLA attention
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Mistral Small 4: 119B Open-Source MoE Model Released | aiHola