MiniMax shipped M3 on June 1, the Shanghai lab's bid to be the first open-weights model bundling three things closed labs treat as table stakes: frontier coding, a 1M-token context window, and native multimodality. The company laid it all out in a research post Monday.
The context trick is a new attention scheme MiniMax calls MSA (MiniMax Sparse Attention). It claims per-token compute at 1M tokens drops to a twentieth of the previous generation, with prefill more than 9x faster and decoding more than 15x. Those are the headline speed numbers, and they're the company's own.
On coding, MiniMax reports 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, and 74.2% on MCP Atlas. The lower marks tell the more honest story: 34.8% on SWE-fficiency, 28.8% on KernelBench Hard. All of it is internal testing using Claude Code as scaffolding, per the methodology notes, so independent confirmation hasn't landed yet.
Alongside the model, MiniMax updated its agent product, MiniMax Code, and refreshed its three-tier Token Plan, running from $20 to $120 a month. The API is live now: calls under 512K input tokens bill at the standard rate, anything above at a long-context premium.
Weights aren't out yet. MiniMax says the technical report and open model weights arrive over the next 10 days.
Bottom Line
M3 reports 59.0% on SWE-Bench Pro on the company's own benchmarks, with weights and a technical report promised within 10 days.
Quick Facts
- Released June 1, 2026
- Context window: up to 1M tokens via MiniMax Sparse Attention (MSA)
- 59.0% SWE-Bench Pro, 66.0% Terminal-Bench 2.1 (company-reported, internal testing)
- 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas (company-reported)
- Token Plan tiers: $20, $50, $120/month
- Weights and technical report promised within 10 days




