Microsoft MAI-Thinking-1: 1T-Parameter Reasoning Model

Microsoft AI released MAI-Thinking-1 on June 2, its first frontier-scale reasoning model, alongside a 109-page technical report that goes deeper into training methods than most Big Tech labs bother to publish. The model is a sparse Mixture of Experts running 35B active parameters out of roughly 1 trillion total, with a 256k-token context window. It is not open source, and for now it is available only to select early partners.

The numbers Microsoft wants you to look at

On benchmarks, the blog post leads with 97.0% on AIME 2025 and 52.8% on SWE-Bench Pro. Microsoft also claims the model is preferred to Sonnet 4.6 in blind human side-by-side tests, an evaluation run with rater pool partner Surge across 1,276 tasks. Human preference results are the easiest benchmark to dress up, and a win on a curated task set says less than the AIME figure does. The math scores are harder to wave away.

The more interesting claim is what is missing from the training pipeline. Microsoft says it trained the model from the ground up on clean, commercially licensed data, with no distillation from third-party models. The argument in the report is that a model copying another model's reasoning never learns why the reasoning works, so it breaks down during long reinforcement learning runs. Plausible, and mostly unproven. There is an asterisk too: they self-distill from their own earlier checkpoints to recover from crashed runs, so the no-distillation rule applies to other people's models, not the technique itself.

How it was actually built

This is where the report earns its reputation. Instead of one training run, Microsoft built what it calls a Hill-Climbing Machine. Three specialist models do the early work, each with its own reward signal: one climbs STEM and competition code, one climbs agentic coding and tool use, one climbs helpfulness and safety. A supervised pass then distills all three into a single model, and a final RL climb produces MAI-Thinking-1.

The detail that got technical readers talking is that Microsoft appears to have started RL from a checkpoint with no prior reasoning exposure. No teacher to fall back on, on an unstable RL run. That is a harder cold-start than most labs attempt.

Safety was folded into the same reward construction rather than bolted on afterward. Microsoft treats both unsafe compliance and unnecessary refusal as defects, weighted by potential severity of harm, trained with the same RL infrastructure used for capability.

What's not in the announcement

The data section, which starts around page 80 of the report, has the same licensing tension as every other major LLM. Coverage noted that the majority of the web corpus comes from a proprietary crawl of roughly 1.2 trillion pages, filtered with a block list to strip adult and piracy domains. "Commercially licensed" is doing some work in the marketing copy that the report itself complicates.

Microsoft trained the model on its own in-house infrastructure, co-designed with its accelerators, though the specific cluster size circulating in early write-ups isn't something I could confirm against the report. MAI-Thinking-1 is one of seven MAI models announced at Build, including MAI-Code-1-Flash, a smaller 137B model headed for GitHub Copilot in VS Code.

The model isn't on public leaderboards yet, so independent verification of the benchmark claims is still pending. Third-party access is rolling out through OpenRouter, fal, and Baseten.

Microsoft Ships MAI-Thinking-1, a 1T-Parameter Reasoning Model Built From Scratch

The numbers Microsoft wants you to look at

How it was actually built

What's not in the announcement

Oliver Senti

Related Articles

Fei-Fei Li Splits World Models Into Renderers, Simulators, and Planners

Microsoft Launches MAI-Transcribe-1.5 Speech Model

Microsoft Unveils MAI-Thinking-1, Its First Reasoning Model

Stay Ahead of the AI Curve