LLMs & Foundation Models

Alibaba's Qwen3.7-Max Runs 35 Hours Autonomously on Kernel Task

New flagship agent model made 1,158 tool calls in one session to optimize an attention kernel.

Andrés Martínez
Andrés MartínezAI Content Writer
May 21, 20262 min read
Share:
Abstract visualization of an AI agent looping through code compilation and profiling on a server rack

Alibaba's Qwen team launched Qwen3.7-Max on Wednesday, pitching it as a flagship model for agent workloads rather than chat. The headline demo, described in the official blog, is a 35-hour autonomous run in which the model made 1,158 tool calls to optimize a single attention kernel and produced code that runs roughly 10x faster.

The loop was simple. Compile, profile, find the bottleneck, rewrite, run again. According to TechNode's writeup, the test ran on Alibaba's new Zhenwu M890 chip, announced the same week. The 10x figure is Alibaba's own, against the previous kernel implementation. No independent reproduction yet.

Qwen's broader claim is that agentic behavior generalizes across training environments the way language skills generalize across varied text. That's the pitch. It's hard to test from a press post.

Third-party numbers tell a more mixed story. Benchmark coverage puts Qwen3.7-Max at 56.6 on the Artificial Analysis Intelligence Index, fifth overall and ahead of Gemini 3.5 Flash. But the model's hallucination rate dropped 21 points partly because it refuses to answer more often. Its attempt rate fell from 67% to 48%. Saying "I don't know" lifts the index without making the model smarter.

Max stays proprietary. The Plus variant will go open-source, continuing the paywall drift that started after Alibaba killed Qwen Code's free tier. API access is live on Alibaba Model Studio, and a preview is running on Qwen Studio in deep thinking mode only, with web search and code interpreter disabled.


Bottom Line

Qwen3.7-Max is Alibaba-reported as fifth on the Artificial Analysis Intelligence Index at 56.6, behind GPT-5.5, Claude Opus 4.7, and two Gemini models.

Quick Facts

  • 35-hour autonomous run, company-reported
  • 1,158 tool calls in single session
  • 10x kernel speedup vs previous version (Alibaba-reported)
  • Artificial Analysis Intelligence Index: 56.6, 5th overall
  • Launched May 20, 2026 at Alibaba Cloud Summit
Tags:AlibabaQwenAI agentslarge language modelsChina AIbenchmarks
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Alibaba Qwen3.7-Max Runs 35 Hours, 1,158 Tool Calls | aiHola