Alibaba Qwen3.7-Max Runs 35 Hours, 1,158 Tool Calls

Abstract visualization of an AI agent looping through code compilation and profiling on a server rack

Alibaba's Qwen team launched Qwen3.7-Max on Wednesday, pitching it as a flagship model for agent workloads rather than chat. The headline demo, described in the official blog, is a 35-hour autonomous run in which the model made 1,158 tool calls to optimize a single attention kernel and produced code that runs roughly 10x faster.

The loop was simple. Compile, profile, find the bottleneck, rewrite, run again. According to TechNode's writeup, the test ran on Alibaba's new Zhenwu M890 chip, announced the same week. The 10x figure is Alibaba's own, against the previous kernel implementation. No independent reproduction yet.

Qwen's broader claim is that agentic behavior generalizes across training environments the way language skills generalize across varied text. That's the pitch. It's hard to test from a press post.

Third-party numbers tell a more mixed story. Benchmark coverage puts Qwen3.7-Max at 56.6 on the Artificial Analysis Intelligence Index, fifth overall and ahead of Gemini 3.5 Flash. But the model's hallucination rate dropped 21 points partly because it refuses to answer more often. Its attempt rate fell from 67% to 48%. Saying "I don't know" lifts the index without making the model smarter.

Max stays proprietary. The Plus variant will go open-source, continuing the paywall drift that started after Alibaba killed Qwen Code's free tier. API access is live on Alibaba Model Studio, and a preview is running on Qwen Studio in deep thinking mode only, with web search and code interpreter disabled.

Bottom Line

Qwen3.7-Max is Alibaba-reported as fifth on the Artificial Analysis Intelligence Index at 56.6, behind GPT-5.5, Claude Opus 4.7, and two Gemini models.

Quick Facts

35-hour autonomous run, company-reported
1,158 tool calls in single session
10x kernel speedup vs previous version (Alibaba-reported)
Artificial Analysis Intelligence Index: 56.6, 5th overall
Launched May 20, 2026 at Alibaba Cloud Summit

Tags:AlibabaQwenAI agentslarge language modelsChina AIbenchmarks

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Alibaba's Qwen3.7-Max Runs 35 Hours Autonomously on Kernel Task

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

Open-Weight LLMs in 2026 Reshape Attention to Cut Long-Context Costs

Perceptron AI Launches Mk1 Video Model at 80% Discount

Google's Gemini 3.5 Flash Beats Its Own Flagship at I/O 2026

Stay Ahead of the AI Curve