Ant Group Releases Ling-2.6-Flash, 104B MoE LLM

Abstract visualization of a sparse neural network with a small subset of nodes illuminated, representing active parameters in a mixture-of-experts model

Ant Group on Wednesday released Ling-2.6-flash, a 104-billion-parameter mixture-of-experts model that activates only 7.4 billion parameters at inference. The pitch, per Ant's press release, is token economy over raw scale.

The elephant, revealed

Before the official launch, the model spent days on OpenRouter under the codename "Elephant Alpha." Developers used it blind. Ant says it topped the trending charts during that run and hit roughly 100 billion daily token calls at peak. Take that number with whatever skepticism you apply to a company citing its own traffic figures.

Kilo Code confirmed the identity in a blog post, with the predictable joke that you can't spell Elephant without Ant.

What the tokens actually cost

On the Artificial Analysis Intelligence Index, Ling-2.6-flash scores 26. That's a 10-point jump over its predecessor Ling-flash-2.0, though the index aggregates ten different evals, so a single number flattens a lot of detail.

The interesting part is what Ant got the score with. Total output consumed during the full eval: 15 million tokens. Nemotron-3-Super, which Ant picked as its comparison, consumed more than 110 million. Ant's framing: 86% less spend for comparable intelligence. The framing that matters less: Ant chose the comparison.

API pricing lands at $0.10 per million input tokens and $0.30 per million output on Ant's endpoint. Free on OpenRouter for the first week.

Built for agents

Ant optimized specifically for agentic workflows, citing results on BFCL-V4, SWE-bench Verified, TAU2-bench, Claw-Eval and PinchBench. The company calls it SOTA in its size class, the qualification every size-class claim carries these days. Against genuinely larger models the comparison stops working.

On speed: 215 tokens per second sustained, peaks around 340 on a 4-card H20 setup, with prefill throughput Ant puts at 2.2 times Nemotron-3-Super. Fast, by current standards.

Availability

The model is live on OpenRouter with free access this week, and through Ant's Alipay Tbox platform. A commercial variant called LingDT routes through Ant Digital Technologies for enterprise customers. Previous Ling generations remain in Ant's GitHub repo; the 2.6-flash weights were not posted there at the time of the announcement.

The free OpenRouter tier ends seven days from launch. After that, everything runs on the $0.10 and $0.30 pricing.

Tags:Ant GroupLing 2.6 flashMoEopen source LLMAI agentsOpenRoutertoken efficiencyChinese AIinclusionAI

Liza Chan

AI & Emerging Tech Correspondent

Liza covers the rapidly evolving world of artificial intelligence, from breakthroughs in research labs to real-world applications reshaping industries. With a background in computer science and journalism, she translates complex technical developments into accessible insights for curious readers.

Ant Group Launches Ling-2.6-Flash, a 104B MoE Model Tuned for Token Efficiency

The elephant, revealed

What the tokens actually cost

Built for agents

Availability

Liza Chan

Related Articles

OpenAI launches Codex-powered Workspace Agents in ChatGPT and Slack

Tencent Open-Sources CubeSandbox, a RustVMM-Based Runtime for AI Agents

OpenAI gives Codex Mac control, parallel agents, and 90-plus plugins

Stay Ahead of the AI Curve