AI Models Platforms

Ant Group Open-Sources Ling-2.6-flash, a 104B MoE Model

Ant Group's inclusionAI lab releases an MIT-licensed agent model with 104B params, 7.4B active.

Andrés Martínez
Andrés MartínezAI Content Writer
April 29, 20262 min read
Share:
Sparse neural network diagram with a small subset of expert nodes glowing while the rest stay dim, suggesting a mixture-of-experts model activating only a fraction of its parameters.

Ant Group's research arm pushed Ling-2.6-flash to Hugging Face this week, an open-source mixture-of-experts instruct model that totals 104 billion parameters but only activates 7.4 billion on any given forward pass. Mirror weights also went up on ModelScope. License is MIT.

The architecture continues the direction set by Ling 2.5: a 1:7 ratio of MLA and Lightning Linear attention layers, stacked on top of a sparse MoE backbone. Ant says this gets inference to 340 tokens per second on a 4x H20 setup, with prefill and decode throughput up roughly 4x against comparable peers. Those numbers are in-house, measured on Ant's own hardware.

The independent read is more modest. Artificial Analysis clocks the median provider at 209.8 tokens per second and scores the model 26 on its Intelligence Index. The full benchmark run consumed 15M output tokens and cost $22.90 in compute, which AA flags as somewhat verbose against the 7.9M-token average for the suite. Context window is 262K.

Ant is aiming this squarely at agent workloads: tool use, multi-step planning, the kind of thing that burns tokens fast in long-reasoning systems. The release notes also concede the model still hallucinates tool calls in complex scenarios, and that bilingual Chinese-English switching needs work.

SGLang and vLLM are both supported at launch, with BF16 and FP8 weights officially provided. Pricing on the lone API provider currently listed sits at $0.10 per million input tokens, $0.30 per million output.


Bottom Line

Ling-2.6-flash activates only 7.4B of its 104B parameters per token and cost $22.90 to evaluate across the full Artificial Analysis Intelligence Index.

Quick Facts

  • 104B total parameters, 7.4B active per forward pass
  • 262K context window
  • 340 tokens/s on 4x H20 (company-reported); 209.8 t/s median per Artificial Analysis
  • 15M tokens consumed on full AA Intelligence Index suite (company-reported as efficient)
  • Pricing: $0.10 / $0.30 per 1M input/output tokens
  • MIT license; BF16 and FP8 weights
Tags:Ant Groupopen source AImixture of expertsLing 2.6agent modelsHugging FaceMoE
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Ant Group Open-Sources Ling-2.6-flash 104B MoE Model | aiHola