PrismML squeezed a 4B-class diffusion model down to under a gigabyte, and it now runs on a phone. The Pasadena startup announced Bonsai Image 4B on Tuesday, a compressed image generator built on Black Forest Labs' FLUX.2 Klein 4B. It comes in two flavors, 1-bit and ternary.
The 1-bit variant cuts the diffusion transformer to 0.93 GB, which PrismML pegs at an 8.3x reduction from full precision. The ternary version lands at 1.21 GB. The company says the compressed models hold up to 95% of the original quality, though that figure is self-reported with no independent benchmark behind it yet.
"Local image generation is the next major milestone for creative AI," per CEO Babak Hassibi, a Caltech professor. Standard launch framing. The harder numbers: a 512x512 image takes about 9.4 seconds on an iPhone 17 Pro Max, and roughly 6 seconds on an M4 Pro Mac, where PrismML claims generation runs up to 5.6x faster than the uncompressed pipeline.
Weights and code ship under Apache 2.0. There's a WebGPU browser demo, the full model collection on Hugging Face, and an iOS app called Bonsai Studio. PrismML didn't spell out how small the whole stack gets once the text encoder is bundled in, since that piece compresses far less than the transformer.
Bottom Line
PrismML's 1-bit Bonsai Image 4B compresses the diffusion transformer to 0.93 GB and generates a 512x512 image in about 9.4 seconds on an iPhone 17 Pro Max, per the company.
Quick Facts
- 1-bit diffusion transformer: 0.93 GB, an 8.3x reduction (company-reported)
- Ternary variant: 1.21 GB, a 6.4x reduction (company-reported)
- 512x512 image in about 9.4 seconds on iPhone 17 Pro Max
- About 6 seconds on Mac M4 Pro; up to 5.6x faster than full-precision pipeline (company-reported)
- Retains up to 95% of full-precision quality (company-reported)
- Announced May 26, 2026; open weights under Apache 2.0




