Ant Group Open-Sources Ling-2.6-flash 104B MoE Model

$Sparse neural network diagram with a small subset of expert nodes glowing while the rest stay dim, suggesting a mixture-of-experts model activating only a fraction of its parameters.$

Ant Group's research arm pushed Ling-2.6-flash to Hugging Face this week, an open-source mixture-of-experts instruct model that totals 104 billion parameters but only activates 7.4 billion on any given forward pass. Mirror weights also went up on ModelScope. License is MIT.

The architecture continues the direction set by Ling 2.5: a 1:7 ratio of MLA and Lightning Linear attention layers, stacked on top of a sparse MoE backbone. Ant says this gets inference to 340 tokens per second on a 4x H20 setup, with prefill and decode throughput up roughly 4x against comparable peers. Those numbers are in-house, measured on Ant's own hardware.

The independent read is more modest. Artificial Analysis clocks the median provider at 209.8 tokens per second and scores the model 26 on its Intelligence Index. The full benchmark run consumed 15M output tokens and cost $22.90 in compute, which AA flags as somewhat verbose against the 7.9M-token average for the suite. Context window is 262K.

Ant is aiming this squarely at agent workloads: tool use, multi-step planning, the kind of thing that burns tokens fast in long-reasoning systems. The release notes also concede the model still hallucinates tool calls in complex scenarios, and that bilingual Chinese-English switching needs work.

SGLang and vLLM are both supported at launch, with BF16 and FP8 weights officially provided. Pricing on the lone API provider currently listed sits at $0.10 per million input tokens, $0.30 per million output.

Bottom Line

Ling-2.6-flash activates only 7.4B of its 104B parameters per token and cost $22.90 to evaluate across the full Artificial Analysis Intelligence Index.

Quick Facts

104B total parameters, 7.4B active per forward pass
262K context window
340 tokens/s on 4x H20 (company-reported); 209.8 t/s median per Artificial Analysis
15M tokens consumed on full AA Intelligence Index suite (company-reported as efficient)
Pricing: $0.10 / $0.30 per 1M input/output tokens
MIT license; BF16 and FP8 weights

Tags:Ant Groupopen source AImixture of expertsLing 2.6agent modelsHugging FaceMoE

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Ant Group Open-Sources Ling-2.6-flash, a 104B MoE Model

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

talkie Releases 13B Language Model Trained Only on Pre-1931 Text

Mistral Ships Medium 3.5 as Open-Weight Flagship

Anthropic's Tokenizer Triples Hindi Token Costs, Test Suggests

Stay Ahead of the AI Curve