Alibaba's Qwen team has released the small-model tier of its Qwen 3.5 series, adding 0.8B, 2B, 4B, and 9B parameter variants to a lineup that already spans up to 397 billion parameters. The models are available on Hugging Face under Apache 2.0, with base versions included.
What makes these interesting: they share the same unified architecture as the larger Qwen 3.5 models. That means native multimodality (text, image, video processed in a single model rather than bolted-on adapters), the hybrid Gated Delta Network plus Mixture-of-Experts design, and RL-scaled training. Cramming all of that into a 0.8B model is ambitious. Whether it holds up in practice is another question, as no independent benchmarks exist yet for these small variants.
The release completes a rapid three-wave rollout. Alibaba shipped the flagship 397B-A17B on February 16, followed by medium models (27B, 35B-A3B, 122B-A10B) on February 24. The small models round out the family and target edge devices, phones, and local inference on consumer GPUs. Quantized versions from third-party providers like Unsloth are already appearing.
The 9B model is the one to watch. If it approaches the quality of prior-generation models with 10x more parameters (as Qwen3's 4B famously matched Qwen2.5-72B on some benchmarks), it could become a go-to for lightweight multimodal agents. Alibaba hasn't published detailed small-model benchmarks yet, so those claims remain unverified. All models support 201 languages and the series' default "thinking mode" for chain-of-thought reasoning.
Bottom Line
Qwen 3.5 now spans eight model sizes from 0.8B to 397B, all sharing native multimodal capabilities under Apache 2.0.
Quick Facts
- Four new sizes: 0.8B, 2B, 4B, 9B parameters
- License: Apache 2.0 (open-weight, commercial use allowed)
- Architecture: Gated Delta Networks + Mixture-of-Experts, native multimodal
- Language support: 201 languages and dialects
- Base (pretrained) versions also released alongside instruct-tuned variants




