Alibaba's Qwen team released Qwen-Image-Layered this week, an open-source diffusion model that takes a flat image and breaks it into separate RGBA layers. Think Photoshop's layer panel, but automated by AI.
The core pitch: current image editing models struggle with consistency because everything lives on one canvas. Change a person's shirt and you might accidentally shift the background. Qwen-Image-Layered sidesteps this by physically separating elements into distinct layers with their own transparency masks. Edit one layer, leave everything else untouched.
The model supports variable decomposition (3-10 layers per image) and recursive splitting, so a single layer can itself be decomposed further. The team built three components to make this work: an RGBA-VAE that handles both RGB and RGBA in the same latent space, a VLD-MMDiT architecture for variable-length layer outputs, and a multi-stage training pipeline. Training data came from filtered Photoshop PSD files, giving the model real examples of professional layer structures.
Weights are Apache 2.0 licensed. The model is already live on Hugging Face, Replicate, and fal.ai. A Gradio demo lets you test it without local setup.
One limitation worth noting: the team recommends 640 resolution for this version, with 1024 as an alternative. Higher resolutions may come in future iterations.
The Bottom Line: Qwen-Image-Layered brings Photoshop-style layer separation to AI image editing, with open weights and immediate API access.
QUICK FACTS
- Supports 3-10 configurable RGBA layers per image
- Licensed under Apache 2.0
- Available on Hugging Face, Replicate, and fal.ai
- Recommended resolution: 640px (1024px optional)
- Paper published December 17, 2025 on arXiv (2512.15603)




