Mistral Small 4: 119B Open-Source MoE Model Released

Abstract visualization of a neural network with branching expert pathways converging into a single output stream

Mistral released Small 4 on Sunday, a 119-billion-parameter mixture-of-experts model that merges what were previously three separate product lines into one. Instruct, reasoning (formerly Magistral), and agentic coding (Devstral) now live in a single checkpoint, available under Apache 2.0.

The model activates just 6 billion parameters per token across 128 experts (4 active per forward pass), accepts text and image inputs, and supports a 256k context window. A configurable reasoning_effort parameter lets developers toggle between fast chat-style responses and deeper step-by-step reasoning at request time, eliminating the need to route between separate models. Weights are on Hugging Face in FP8, with an NVFP4 quantized checkpoint and a trained eagle head for speculative decoding. No base model was published.

Mistral claims 40% lower latency and 3x throughput gains over Mistral Small 3, though these are company-reported numbers without independent verification. On its own selected benchmarks, the company says Small 4 matches or beats GPT-OSS 120B while producing shorter outputs. The efficiency angle is the real pitch here: on AA LCR, Small 4 reportedly hits 0.72 in 1.6K characters of output where comparable Qwen models need 3.5-4x more text for similar scores.

The architecture uses MLA (Multi-Head Latent Attention), the same DeepSeek V3-derived approach Mistral adopted for Large 3 in late 2025. Minimum hardware is 4x H100 or 2x H200 GPUs. The model is also available day-zero as an NVIDIA NIM container, with support already in vLLM, llama.cpp, and SGLang.

Bottom Line

Mistral Small 4 consolidates three model families into one 119B Apache 2.0 checkpoint with 6B active parameters, configurable reasoning, and multimodal input.

Quick Facts

119B total parameters, 6B active per token (128 experts, 4 active)
256k token context window, text and image input
Apache 2.0 license, no base model released
FP8 and NVFP4 weights available, plus eagle head for speculative decoding
Company-reported: 40% latency reduction and 3x throughput vs. Mistral Small 3

Tags:Mistral AIopen-source AImixture-of-expertsmultimodal AIApache 2.0reasoning modelsMLA attention

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Mistral Launches Small 4, a 119B MoE Model Under Apache 2.0

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

MiniMax M2.7 Helped Build Itself Through Recursive Self-Evolution

OpenAI Launches GPT-5.4 Mini and Nano for Speed-First Workloads

AI2 Releases OLMo Hybrid, a 7B Model Mixing RNN and Transformer Layers

Stay Ahead of the AI Curve