Alibaba's Qwen team launched Qwen3.7-Max on Wednesday, pitching it as a flagship model for agent workloads rather than chat. The headline demo, described in the official blog, is a 35-hour autonomous run in which the model made 1,158 tool calls to optimize a single attention kernel and produced code that runs roughly 10x faster.
The loop was simple. Compile, profile, find the bottleneck, rewrite, run again. According to TechNode's writeup, the test ran on Alibaba's new Zhenwu M890 chip, announced the same week. The 10x figure is Alibaba's own, against the previous kernel implementation. No independent reproduction yet.
Qwen's broader claim is that agentic behavior generalizes across training environments the way language skills generalize across varied text. That's the pitch. It's hard to test from a press post.
Third-party numbers tell a more mixed story. Benchmark coverage puts Qwen3.7-Max at 56.6 on the Artificial Analysis Intelligence Index, fifth overall and ahead of Gemini 3.5 Flash. But the model's hallucination rate dropped 21 points partly because it refuses to answer more often. Its attempt rate fell from 67% to 48%. Saying "I don't know" lifts the index without making the model smarter.
Max stays proprietary. The Plus variant will go open-source, continuing the paywall drift that started after Alibaba killed Qwen Code's free tier. API access is live on Alibaba Model Studio, and a preview is running on Qwen Studio in deep thinking mode only, with web search and code interpreter disabled.
Bottom Line
Qwen3.7-Max is Alibaba-reported as fifth on the Artificial Analysis Intelligence Index at 56.6, behind GPT-5.5, Claude Opus 4.7, and two Gemini models.
Quick Facts
- 35-hour autonomous run, company-reported
- 1,158 tool calls in single session
- 10x kernel speedup vs previous version (Alibaba-reported)
- Artificial Analysis Intelligence Index: 56.6, 5th overall
- Launched May 20, 2026 at Alibaba Cloud Summit




