Ant Group's inclusionAI team has released Ling-2.5-1T, a trillion-parameter open-source model that activates 63 billion parameters per token. It ships under the MIT license. The model upgrades the previous Ling-1T across architecture, training data (29 trillion tokens, up from 20T), and post-training alignment, positioning it as the most capable "instant" (non-thinking) model in the Ling family.
The architecture is the headline change. Ling 2.5 swaps out the grouped query attention from Ling 2.0 for a hybrid setup: a 1:7 ratio of multi-head latent attention to Lightning Linear Attention. The practical result, per Ant Group's own benchmarks, is 3x higher decode throughput on sequences over 32K tokens compared to the previous generation. Context extends to 1 million tokens via YaRN scaling. On the BFCL-V4 benchmark for tool calling, the model claims leading open-source performance, and it's been trained with Agentic RL to work natively with platforms like Claude Code and OpenCode.
The long-context results are where things get interesting. Ling-2.5-1T beats Kimi K2.5 and DeepSeek V3.2 on RULER and MRCR benchmarks (averaged across 16K to 256K windows), and scores perfectly on needle-in-a-haystack tests up to 1M tokens. The team openly acknowledges a gap remains against GPT-5.2 and Gemini 3 Pro on multi-step long-horizon tasks. That kind of candor is unusual in a model card.
A composite reward mechanism combining correctness and "process redundancy" lets the model match the reasoning quality of thinking models that burn roughly 4x more output tokens, according to Ant Group's self-reported numbers. Independent verification hasn't surfaced yet. Weights are available on ModelScope for users in mainland China. No API pricing has been announced for this version.
Bottom Line
Ling-2.5-1T activates 63B of its 1T parameters under MIT license, claiming to match thinking-model reasoning at one-quarter the token cost, though benchmarks are self-reported.
Quick Facts
- 1 trillion total parameters, 63B active per token
- 29 trillion pre-training tokens (up from 20T in Ling-1T)
- Context window: up to 1M tokens via YaRN scaling
- MIT license, weights on Hugging Face and ModelScope
- Still trails GPT-5.2 and Gemini 3 Pro on long-horizon tasks (company-reported)




