Baidu ERNIE Image 8B: New AI Image Generator Coming Soon

Abstract neural network visualization with interconnected nodes forming an image canvas, blending Chinese and Western design elements

Baidu is about to enter the dedicated image generation race. A ComfyUI pull request merged on April 12 implements support for a new ERNIE Image model, a diffusion transformer separate from Baidu's existing multimodal ERNIE line. The model weighs in at 8 billion parameters, according to the source, and borrows its VAE from Black Forest Labs' Flux architecture while using Mistral's Ministral 3.3B as its text encoder.

That's an unusual combo. The code review reveals the model uses rotary position embeddings, patch-based image encoding, and adaptive layer normalization for diffusion conditioning. Weights and a Hugging Face page haven't appeared yet, but ComfyUI's lead developer comfyanonymous already merged the integration code, which suggests the release is days away at most. Model weights and workflows remain the open question: at least one GitHub user has already asked where to find them.

Baidu hasn't been a major player in standalone image generation. Its previous image work, ERNIE-ViLG, dates back to 2022, and more recently the company focused its visual generation efforts inside the massive 2.4-trillion-parameter ERNIE 5.0 multimodal model. A dedicated, open-weights image model would be a different move. The comparison floating around AI communities is Z-Image, Alibaba's 6B-parameter model that ranked first among open-source models on the Artificial Analysis leaderboard. Whether ERNIE Image can match that reception with 8B parameters and a completely different text encoder remains to be seen.

No pricing, license, or benchmark data has been disclosed. The model's Hugging Face page is expected to go live in the coming days.

Bottom Line

Baidu's first standalone image generation model pairs an 8B diffusion transformer with Flux's VAE and Ministral 3.3B, with ComfyUI support already merged before the weights are even public.

Quick Facts

8B parameters (unverified, per source)
VAE: Flux architecture
Text encoder: Ministral 3.3B
ComfyUI PR #13369 merged April 12, 2026
Model weights not yet publicly available

Tags:BaiduERNIEimage generationdiffusion modelComfyUIopen source AIMinistral

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Baidu Preps Standalone ERNIE Image Generator for Imminent Release

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

Google Launches Nano Banana 2 Lite and Gemini Omni Flash

Meta Superintelligence Labs Launches Muse Image, Previews Muse Video

Mistral Releases Leanstral 1.5, an Apache-2.0 Lean 4 Proof Model

Stay Ahead of the AI Curve