Mistral pushed Medium 3.5 out the door Wednesday, a 128 billion parameter dense model with a 256k context window. The French lab put the weights on Hugging Face under a modified MIT license and made it the new default in Le Chat and the Vibe coding agent.
The benchmark framing is telling. On its company blog, Mistral reports 77.6% on SWE-Bench Verified and 91.4 on τ³-Telecom, comparing the numbers against Qwen3.5 397B and its own older Devstral 2. Both scores are self-measured. GPT and Gemini do not appear in the charts.
The architecture pick is the other story. Chinese rivals have leaned into mixture-of-experts designs with hundreds of billions or a trillion total parameters, activating only a slice per request. Medium 3.5 stays dense, which costs more compute at inference but tends to behave more predictably on long agentic runs. That looks deliberate.
API pricing is $1.5 per million input tokens and $7.5 per million output. Mistral says the model can be self-hosted on as few as four GPUs, which is the actual pitch here: open weights inside your own infrastructure, without renting capacity from an American hyperscaler. That argument lands harder in Europe than elsewhere.
Reasoning effort is now configurable per request, and the vision encoder was retrained from scratch to handle variable image sizes. The full model collection, including FP8 weights and an EAGLE draft model for speculative decoding, is live now. Public preview starts on the Pro, Team and Enterprise plans.
Bottom Line
Medium 3.5 runs self-hosted on four GPUs at $1.5/$7.5 per million input/output tokens through the API.
Quick Facts
- 128 billion parameters, dense architecture
- 256k token context window
- 77.6% on SWE-Bench Verified (company-reported)
- $1.5 input / $7.5 output per million tokens via API
- Released April 29, 2026 under modified MIT license




