AI Models Platforms

Microsoft Ships MAI-Thinking-1, a 1T-Parameter Reasoning Model Built From Scratch

Microsoft AI's first frontier reasoning model trains without third-party distillation and comes with a 109-page report.

Oliver Senti
Oliver SentiSenior AI Editor
June 6, 20263 min read
Share:
Abstract visualization of a large neural network training pipeline with branching specialist paths converging into a single model

Microsoft AI released MAI-Thinking-1 on June 2, its first frontier-scale reasoning model, alongside a 109-page technical report that goes deeper into training methods than most Big Tech labs bother to publish. The model is a sparse Mixture of Experts running 35B active parameters out of roughly 1 trillion total, with a 256k-token context window. It is not open source, and for now it is available only to select early partners.

The numbers Microsoft wants you to look at

On benchmarks, the blog post leads with 97.0% on AIME 2025 and 52.8% on SWE-Bench Pro. Microsoft also claims the model is preferred to Sonnet 4.6 in blind human side-by-side tests, an evaluation run with rater pool partner Surge across 1,276 tasks. Human preference results are the easiest benchmark to dress up, and a win on a curated task set says less than the AIME figure does. The math scores are harder to wave away.

The more interesting claim is what is missing from the training pipeline. Microsoft says it trained the model from the ground up on clean, commercially licensed data, with no distillation from third-party models. The argument in the report is that a model copying another model's reasoning never learns why the reasoning works, so it breaks down during long reinforcement learning runs. Plausible, and mostly unproven. There is an asterisk too: they self-distill from their own earlier checkpoints to recover from crashed runs, so the no-distillation rule applies to other people's models, not the technique itself.

How it was actually built

This is where the report earns its reputation. Instead of one training run, Microsoft built what it calls a Hill-Climbing Machine. Three specialist models do the early work, each with its own reward signal: one climbs STEM and competition code, one climbs agentic coding and tool use, one climbs helpfulness and safety. A supervised pass then distills all three into a single model, and a final RL climb produces MAI-Thinking-1.

The detail that got technical readers talking is that Microsoft appears to have started RL from a checkpoint with no prior reasoning exposure. No teacher to fall back on, on an unstable RL run. That is a harder cold-start than most labs attempt.

Safety was folded into the same reward construction rather than bolted on afterward. Microsoft treats both unsafe compliance and unnecessary refusal as defects, weighted by potential severity of harm, trained with the same RL infrastructure used for capability.

What's not in the announcement

The data section, which starts around page 80 of the report, has the same licensing tension as every other major LLM. Coverage noted that the majority of the web corpus comes from a proprietary crawl of roughly 1.2 trillion pages, filtered with a block list to strip adult and piracy domains. "Commercially licensed" is doing some work in the marketing copy that the report itself complicates.

Microsoft trained the model on its own in-house infrastructure, co-designed with its accelerators, though the specific cluster size circulating in early write-ups isn't something I could confirm against the report. MAI-Thinking-1 is one of seven MAI models announced at Build, including MAI-Code-1-Flash, a smaller 137B model headed for GitHub Copilot in VS Code.

The model isn't on public leaderboards yet, so independent verification of the benchmark claims is still pending. Third-party access is rolling out through OpenRouter, fal, and Baseten.

Tags:MicrosoftMAI-Thinking-1large language modelsreasoning modelsMixture of ExpertsAI trainingMicrosoft AISWE-BenchAIME
Oliver Senti

Oliver Senti

Senior AI Editor

Former software engineer turned tech writer, Oliver has spent the last five years tracking the AI landscape. He brings a practitioner's eye to the hype cycles and genuine innovations defining the field, helping readers separate signal from noise.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.