Moonshot AI Drops K2.5 With Self-Directing Agent Swarms

Moonshot AI released Kimi K2.5 on January 27, 2026, pushing into territory that makes the previous K2 Thinking model look almost quaint. The new release adds native multimodal capabilities and something the company calls "agent swarm" technology, where the model spawns and coordinates up to 100 specialized sub-agents to tackle complex tasks in parallel.

The swarm thing

What grabbed my attention was the agent swarm architecture. According to Moonshot's technical blog, K2.5 was trained using Parallel-Agent Reinforcement Learning (PARL), which teaches the model to decompose tasks and farm them out to dynamically instantiated sub-agents. The model figures out what specialists it needs (an "AI Researcher," a "Physics Researcher," a "Fact Checker," whatever) and spins them up on the fly.

Moonshot claims this approach reduces execution time by up to 4.5x compared to single-agent setups. The swarm can execute up to 1,500 coordinated tool calls across those 100 sub-agents. The company describes a failure mode they call "serial collapse" where the orchestrator defaults back to doing everything sequentially, and apparently spent considerable training effort preventing that.

This isn't entirely new. MiniMax M2 had interleaved thinking, and multi-agent systems have been a research focus for years. But baking swarm orchestration directly into the model and training it end-to-end feels like a meaningful step.

Agent Swarm is currently in beta on kimi.com.

Visual coding that actually works

K2.5 is natively multimodal, trained on roughly 15 trillion mixed visual and text tokens on top of the K2 base. The model can watch a screen recording and reconstruct the complete frontend code, including interaction logic. The demos show it reproducing websites from video input.

More interesting to me: autonomous visual debugging. The model can inspect its own generated output visually, compare it against a reference, and iterate without human intervention. Moonshot demonstrated this with a task translating the aesthetic of Matisse's "La Danse" into a webpage, with the model checking its work visually at each step.

The benchmark numbers on vision tasks are strong. K2.5 scores 78.5% on MMMU-Pro multimodal reasoning, trailing GPT-5.2 (79.5%) but ahead of Claude Opus 4.5 (74.0%). On their new WorldVQA benchmark for vision-centric world knowledge, K2.5 hits 46.3%, roughly matching Gemini 3 Pro and substantially beating GPT-5.2's 28.0%.

The benchmark situation

The headline number from Moonshot is 50.2% on Humanity's Last Exam with tools enabled. That beats GPT-5.2's 45.5% and Claude Opus 4.5's 43.2% on the same test. On SWE-bench Verified, the software engineering benchmark, K2.5 scores 76.8%, though Claude Opus 4.5 still edges it out at 80.9%.

I want to flag something: these are Moonshot's own evaluations. The company couldn't test GPT-5.2 on all benchmarks due to "service stability issues" and some numbers come from their re-evaluation of competitors under their own conditions. Independent verification from Artificial Analysis and Vals AI on previous Kimi models has sometimes told a different story than Moonshot's published figures.

The BrowseComp scores are worth noting though. K2.5 hits 74.9% with context management enabled, and the Agent Swarm mode pushes that to 78.4%. That's for web browsing and information retrieval tasks, where the parallel agent architecture presumably helps.

What you actually get

K2.5 ships in multiple modes through kimi.com and the API: K2.5 Instant, K2.5 Thinking, K2.5 Agent, and K2.5 Agent Swarm (beta). The architecture is unchanged from K2: a trillion-parameter MoE model with 32 billion parameters active per token, 256K context window, native INT4 quantization.

The weights are available on Hugging Face under a modified MIT license. The modification: if you're using it commercially with over 100 million monthly active users or $20 million monthly revenue, you need to display Kimi K2 branding in your UI. That's the same license as K2 Thinking.

Moonshot also launched Kimi Code alongside, their coding assistant that integrates with VSCode, Cursor, and Zed. The GitHub repo for the broader Kimi K2 family shows deployment guides for vLLM, SGLang, and KTransformers.

Money and momentum

The timing here matters. Moonshot AI just raised $500 million at a $4.3 billion valuation in late December, led by IDG Capital with Alibaba and Tencent participating. According to CNBC, the company is now closing another round that will value it at least $500 million higher, bringing the valuation to $4.8 billion. Two of Moonshot's rivals, Zhipu and MiniMax, recently went public in Hong Kong to enthusiastic responses.

The company, formally Beijing Moonshot AI Technology Co. Ltd., was founded in March 2023. The K2 line has been a genuine success for them, with Kimi K2 Thinking in November claiming to outperform GPT-5 and Claude Sonnet 4.5 on several benchmarks. That release reportedly increased Moonshot's overseas API revenue fourfold and boosted paying users by 170%.

What K2.5 represents is less a dramatic capability leap and more a consolidation: native multimodality plus agent swarms plus the existing reasoning and tool-use strengths. The office productivity angle (Word, Excel, PPT handling, 10,000-word documents) is the kind of practical application that might actually drive enterprise adoption.

The FTC's recent reports on AI ecosystem diversity seem relevant here. Chinese open-weight models have become genuinely competitive with Western frontier models, and K2.5 continues that trend. Whether the benchmarks hold up under independent evaluation is the open question.