Mistral released Small 4 on Sunday, a 119-billion-parameter mixture-of-experts model that merges what were previously three separate product lines into one. Instruct, reasoning (formerly Magistral), and agentic coding (Devstral) now live in a single checkpoint, available under Apache 2.0.
The model activates just 6 billion parameters per token across 128 experts (4 active per forward pass), accepts text and image inputs, and supports a 256k context window. A configurable reasoning_effort parameter lets developers toggle between fast chat-style responses and deeper step-by-step reasoning at request time, eliminating the need to route between separate models. Weights are on Hugging Face in FP8, with an NVFP4 quantized checkpoint and a trained eagle head for speculative decoding. No base model was published.
Mistral claims 40% lower latency and 3x throughput gains over Mistral Small 3, though these are company-reported numbers without independent verification. On its own selected benchmarks, the company says Small 4 matches or beats GPT-OSS 120B while producing shorter outputs. The efficiency angle is the real pitch here: on AA LCR, Small 4 reportedly hits 0.72 in 1.6K characters of output where comparable Qwen models need 3.5-4x more text for similar scores.
The architecture uses MLA (Multi-Head Latent Attention), the same DeepSeek V3-derived approach Mistral adopted for Large 3 in late 2025. Minimum hardware is 4x H100 or 2x H200 GPUs. The model is also available day-zero as an NVIDIA NIM container, with support already in vLLM, llama.cpp, and SGLang.
Bottom Line
Mistral Small 4 consolidates three model families into one 119B Apache 2.0 checkpoint with 6B active parameters, configurable reasoning, and multimodal input.
Quick Facts
- 119B total parameters, 6B active per token (128 experts, 4 active)
- 256k token context window, text and image input
- Apache 2.0 license, no base model released
- FP8 and NVFP4 weights available, plus eagle head for speculative decoding
- Company-reported: 40% latency reduction and 3x throughput vs. Mistral Small 3




