Alibaba's Qwen team released Qwen3.5-397B-A17B today, the first open-weight model in the Qwen3.5 series. It's a Mixture-of-Experts model with 397 billion total parameters but only 17 billion active per token, available on Hugging Face under Apache 2.0.
The big shift from Qwen3: this is natively multimodal. Where Qwen3 required separate vision models (Qwen3-VL), Qwen3.5 fuses text and image understanding through early multimodal token training. On the architecture side, the model adopts the hybrid linear attention approach first seen in Qwen3-Next, combining Gated Delta Networks with standard attention in a 3:1 ratio across 60 layers. That design, paired with 512 routed experts, is built for throughput at long context lengths up to 262K tokens natively (1M via the hosted Qwen3.5-Plus API).
Benchmark numbers on the GitHub repo tell a mixed story. Qwen3.5 leads on visual math benchmarks like MathVision (88.6, beating Gemini 3 Pro's 86.6) and scores 85.0 on MMMU. On text-only reasoning, it sits slightly behind the top proprietary models: 91.3 on AIME26 versus GPT-5.2's 96.7, and 83.6 on LiveCodeBench v6 compared to Gemini 3 Pro's 90.7. Agentic coding tells a similar story, with 76.4 on SWE-bench Verified against Claude Opus 4.5's 80.9. These are all self-reported numbers.
Language coverage jumps to 201 languages and dialects, up from Qwen3's 119. The RL training pipeline also scaled up, with Alibaba claiming reinforcement learning across "million-agent environments," though specifics on that infrastructure remain thin. The blog post promises more model sizes are coming.
For an open-weight model you can self-host, matching or approaching GPT-5.2 and Gemini 3 Pro on several benchmarks is notable. Availability starts now via SGLang, vLLM, and the Qwen API.
Bottom Line
Qwen3.5-397B-A17B is the strongest open-weight multimodal model available today, competitive with top proprietary models on vision benchmarks while activating only 17B of its 397B parameters.
Quick Facts
- 397B total parameters, 17B activated per token (MoE)
- Apache 2.0 license, weights on Hugging Face
- 201 languages and dialects supported
- 262K native context, 1M via hosted API
- 88.6 on MathVision, 85.0 on MMMU (company-reported)
- 60 layers: 45 Gated DeltaNet + 15 Gated Attention




