Mistral AI has released Ministral 3, a family of dense language models built by systematically compressing its Mistral Small 3.1 (24B parameters) into three smaller sizes. The method, which the company calls Cascade Distillation, prunes the parent model and then trains the pruned version to mimic the original. Then it repeats: the 14B becomes the starting point for the 8B, which seeds the 3B. Each model ships in base, instruct, and reasoning variants, all with built-in vision capabilities and context windows up to 256K tokens. The technical paper details the full recipe.
The results look competitive on the company's own benchmarks. Ministral 3 14B Base reportedly matches Mistral Small 3.1 Base across most evaluations while being over 40% smaller. The 8B variant outperformed the larger Gemma 3 12B on most tests except TriviaQA, per Mistral's numbers. On AIME 2025, the 14B reasoning variant hit 85% accuracy compared to Qwen 3 14B Thinking's 73.7%. These are self-reported figures, though, and independent verification would strengthen the claims.
One quirk: during pretraining, the smaller Mistral Small 3.1 was a better teacher than the larger Mistral Medium 3. During fine-tuning, that flipped. The team found that distilling from a preference-tuned teacher checkpoint produced stronger students, even after the student went through its own preference optimization.
Everything is Apache 2.0 licensed and available on Hugging Face. API pricing starts at $0.10 per million tokens for the 3B model. Mistral's announcement positions these for edge devices, single-GPU setups, and cost-sensitive production workloads.
The Bottom Line: Ministral 3's 14B variant matches a model 40% larger on Mistral's benchmarks, and the whole family runs on a single GPU starting at $0.10/M tokens.
QUICK FACTS
- Model sizes: 3B, 8B, and 14B parameters (nine variants total)
- Parent model: Mistral Small 3.1 (24B parameters)
- Context window: up to 256K tokens (128K for reasoning variants)
- License: Apache 2.0
- API pricing: $0.10 (3B), $0.15 (8B), $0.20 (14B) per million input/output tokens (company-reported)
- AIME 2025: 85% accuracy for 14B reasoning variant (self-reported)




