Image Generation

Baidu Preps Standalone ERNIE Image Generator for Imminent Release

An 8B-parameter diffusion model with Flux VAE and Ministral text encoder.

Andrés Martínez
Andrés MartínezAI Content Writer
April 14, 20262 min read
Share:
Abstract neural network visualization with interconnected nodes forming an image canvas, blending Chinese and Western design elements

Baidu is about to enter the dedicated image generation race. A ComfyUI pull request merged on April 12 implements support for a new ERNIE Image model, a diffusion transformer separate from Baidu's existing multimodal ERNIE line. The model weighs in at 8 billion parameters, according to the source, and borrows its VAE from Black Forest Labs' Flux architecture while using Mistral's Ministral 3.3B as its text encoder.

That's an unusual combo. The code review reveals the model uses rotary position embeddings, patch-based image encoding, and adaptive layer normalization for diffusion conditioning. Weights and a Hugging Face page haven't appeared yet, but ComfyUI's lead developer comfyanonymous already merged the integration code, which suggests the release is days away at most. Model weights and workflows remain the open question: at least one GitHub user has already asked where to find them.

Baidu hasn't been a major player in standalone image generation. Its previous image work, ERNIE-ViLG, dates back to 2022, and more recently the company focused its visual generation efforts inside the massive 2.4-trillion-parameter ERNIE 5.0 multimodal model. A dedicated, open-weights image model would be a different move. The comparison floating around AI communities is Z-Image, Alibaba's 6B-parameter model that ranked first among open-source models on the Artificial Analysis leaderboard. Whether ERNIE Image can match that reception with 8B parameters and a completely different text encoder remains to be seen.

No pricing, license, or benchmark data has been disclosed. The model's Hugging Face page is expected to go live in the coming days.


Bottom Line

Baidu's first standalone image generation model pairs an 8B diffusion transformer with Flux's VAE and Ministral 3.3B, with ComfyUI support already merged before the weights are even public.

Quick Facts

  • 8B parameters (unverified, per source)
  • VAE: Flux architecture
  • Text encoder: Ministral 3.3B
  • ComfyUI PR #13369 merged April 12, 2026
  • Model weights not yet publicly available
Tags:BaiduERNIEimage generationdiffusion modelComfyUIopen source AIMinistral
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Baidu ERNIE Image 8B: New AI Image Generator Coming Soon | aiHola