Alibaba's Qwen team is pushing Qwen-Image-2.0-Pro, the higher-fidelity tier of its second-generation image model. The ModelScope demo is live, with API access through Alibaba Cloud's Model Studio.
Pro runs on the architecture Qwen detailed at launch in February: an 8B Qwen3-VL encoder feeding a 7B diffusion decoder, native 2048x2048 output, prompts up to 1,000 tokens. That's roughly 3x smaller than the 20B v1 model, with one endpoint covering both generation and editing.
The pitch is text-heavy work: posters, infographics, slides, bilingual Chinese and English typography. That's where most image models still garble characters, and where Qwen has consistently spent its training budget. The team claims first place on its own AI Arena leaderboard for both generation and editing, though AI Arena is operated by Alibaba itself. On the independent Artificial Analysis arena, GPT Image 2 leads text-to-image, with FLUX.2 Turbo on top of the open-weights list.
API calls use the qwen-image-2.0-pro identifier. Public weights for the 2.0 family haven't surfaced yet. The original 20B v1 weights remain on Hugging Face under Apache 2.0.
Bottom Line
Pro pairs an 8B Qwen3-VL encoder with a 7B diffusion decoder to generate natively at 2048x2048, served via Alibaba Cloud's Model Studio API.
Quick Facts
- Architecture: 8B Qwen3-VL encoder + 7B diffusion decoder
- Native resolution: 2048x2048 pixels
- Maximum prompt length: 1,000 tokens
- Predecessor (v1): 20B-parameter MMDiT model
- API model identifier: qwen-image-2.0-pro




