Google DeepMind dropped DiffusionGemma on June 10, an experimental open model that writes text the way image models draw pictures. Instead of generating one token at a time, it starts with a 256-token "canvas" of random placeholders and refines the whole block over several passes until it reads cleanly. The launch post frames it as a speed play for local, single-user workflows.
It's a 26B Mixture of Experts model with 3.8B parameters active per step, built on the Gemma 4 backbone, shipped under Apache 2.0. Quantized, it fits in 18GB of VRAM. Google reports more than 1,000 tokens per second on a single H100 and 700+ on an RTX 5090, which it pegs at up to 4x faster than autoregressive Gemma 4. Those numbers are Google's own. The vLLM team, which made DiffusionGemma the first diffusion model it supports natively, claims roughly 1,200 tokens/sec at batch size 1 on an H200 in FP8.
And yes, it reasons. The model card lists a configurable thinking mode that emits an internal reasoning channel before the answer, which is unusual for a diffusion model.
The catch sits in plain sight: Google says DiffusionGemma loses to standard Gemma 4 across benchmarks. "For applications that demand maximum quality, we recommend deploying standard Gemma 4," the company writes. So this is a preview, not a replacement. The speed edge also fades under high-concurrency cloud loads, where batched autoregressive models stay ahead.
Weights are on Hugging Face now, with day-zero support across vLLM, Transformers, MLX, and Unsloth. A developer guide walks through the mechanics. Official llama.cpp support is coming soon.
Bottom Line
DiffusionGemma hits 1,000+ tokens/sec on one H100 but scores below standard Gemma 4 on every benchmark, by Google's own admission.
Quick Facts
- 26B total parameters, 3.8B active per step (MoE)
- Released June 10, 2026 under Apache 2.0
- 1,000+ tokens/sec on H100, 700+ on RTX 5090 (Google-reported)
- Generates 256-token blocks in parallel via diffusion
- Runs in 18GB VRAM when quantized; underperforms Gemma 4 on benchmarks




