Alibaba Releases Qwen3.5-397B Open-Weight Multimodal Model

Abstract visualization of a large sparse neural network with interconnected nodes and expert pathways, representing mixture-of-experts architecture

Alibaba's Qwen team released Qwen3.5-397B-A17B today, the first open-weight model in the Qwen3.5 series. It's a Mixture-of-Experts model with 397 billion total parameters but only 17 billion active per token, available on Hugging Face under Apache 2.0.

The big shift from Qwen3: this is natively multimodal. Where Qwen3 required separate vision models (Qwen3-VL), Qwen3.5 fuses text and image understanding through early multimodal token training. On the architecture side, the model adopts the hybrid linear attention approach first seen in Qwen3-Next, combining Gated Delta Networks with standard attention in a 3:1 ratio across 60 layers. That design, paired with 512 routed experts, is built for throughput at long context lengths up to 262K tokens natively (1M via the hosted Qwen3.5-Plus API).

Benchmark numbers on the GitHub repo tell a mixed story. Qwen3.5 leads on visual math benchmarks like MathVision (88.6, beating Gemini 3 Pro's 86.6) and scores 85.0 on MMMU. On text-only reasoning, it sits slightly behind the top proprietary models: 91.3 on AIME26 versus GPT-5.2's 96.7, and 83.6 on LiveCodeBench v6 compared to Gemini 3 Pro's 90.7. Agentic coding tells a similar story, with 76.4 on SWE-bench Verified against Claude Opus 4.5's 80.9. These are all self-reported numbers.

Language coverage jumps to 201 languages and dialects, up from Qwen3's 119. The RL training pipeline also scaled up, with Alibaba claiming reinforcement learning across "million-agent environments," though specifics on that infrastructure remain thin. The blog post promises more model sizes are coming.

For an open-weight model you can self-host, matching or approaching GPT-5.2 and Gemini 3 Pro on several benchmarks is notable. Availability starts now via SGLang, vLLM, and the Qwen API.

Bottom Line

Qwen3.5-397B-A17B is the strongest open-weight multimodal model available today, competitive with top proprietary models on vision benchmarks while activating only 17B of its 397B parameters.

Quick Facts

397B total parameters, 17B activated per token (MoE)
Apache 2.0 license, weights on Hugging Face
201 languages and dialects supported
262K native context, 1M via hosted API
88.6 on MathVision, 85.0 on MMMU (company-reported)
60 layers: 45 Gated DeltaNet + 15 Gated Attention

Tags:QwenAlibabaopen-weightmultimodalmixture-of-expertsApache 2.0LLM

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Alibaba Drops Qwen3.5, a 397B Open-Weight Multimodal Model

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

MiniMax M2.7 Goes Open Source With a Model That Helped Build Itself

Tencent Open-Sources Embodied AI Model for Physical Agents

Black Forest Labs Releases Faster, Lighter FLUX.2 Decoder

Stay Ahead of the AI Curve