Image Generation

ByteDance Shrinks Image Diffusion to Run on Phones

DreamLite packs image generation and editing into a 390M-parameter model for on-device inference.

Andrés Martínez
Andrés MartínezAI Content Writer
April 15, 20262 min read
Share:
Smartphone generating a high-resolution image on screen, with a compact neural network architecture diagram overlaid

ByteDance's Intelligent Creation Lab published DreamLite, a 0.39B-parameter diffusion model designed to generate and edit 1024×1024 images directly on a smartphone. No cloud required. The team claims it's the first unified on-device model to handle both tasks in a single network, and the project page demos look solid, if limited to curated examples.

The architecture runs on a pruned U-Net backbone derived from SDXL, paired with a tiny 1.2M-parameter VAE and Qwen3-VL as the text encoder (quantized to 4-bit for mobile). Training follows a progressive curriculum: text-to-image first, then editing, then joint training on both. DMD2 step distillation compresses inference to just four denoising steps. On a Xiaomi 14 with a Snapdragon 8 Gen3, the team reports sub-one-second generation using W8A8 quantization and pre-computed text embeddings. On iPhone 17 Pro, with live text encoding via 4-bit Qwen VL, that stretches to roughly three seconds.

Benchmark numbers: 0.72 on GenEval for generation, 4.11 on ImgEdit for editing. Both are self-reported and beat existing mobile baselines like SnapGen and SANA-0.6B, according to the paper. The team acknowledges weak spots: the ultra-compact VAE struggles with text rendering and identity preservation in portraits. A larger VAE is planned.

The GitHub repo exists but contains no model weights or code yet. Release timeline: "coming soon."


Bottom Line

DreamLite fits both image generation and editing into a 390M-parameter model that runs locally on phones, but weights and code haven't shipped yet.

Quick Facts

  • Model size: 0.39B parameters (U-Net) with 1.2M-parameter VAE
  • Inference: 4 denoising steps via DMD2 distillation
  • Speed: sub-1s on Xiaomi 14 (pre-computed embeddings), ~3s on iPhone 17 Pro (live text encoding)
  • Benchmarks (self-reported): GenEval 0.72, ImgEdit 4.11
  • Paper published: March 30, 2026 on arXiv
Tags:ByteDanceDreamLiteon-device AIdiffusion modelsmobile AIimage generationimage editing
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

ByteDance DreamLite: 390M Diffusion Model Runs on Phones | aiHola