Image Generation

Ideogram Releases Ideogram 4, Its First Open-Weight Image Model

A 9.3B open-weight image model that beats much bigger rivals on text, under a non-commercial license.

Liza Chan
Liza ChanAI & Emerging Tech Correspondent
June 4, 20264 min read
Share:
A GPU workstation rendering a detailed typographic poster, suggesting local AI image generation

Ideogram pushed the weights and inference code for Ideogram 4 to the public on June 3, the Toronto company's first open-weight text-to-image model. It runs at 9.3 billion parameters, ships in two quantized flavors on Hugging Face, and arrives with a JSON prompting interface that does most of the interesting work.

The small-model flex

The headline number is the parameter count, or rather how low it is. At 9.3B, Ideogram 4 is a fraction of the size of the open models it's being measured against, and the company's own GitHub repo claims it still beats them on text rendering. The comparisons it picks: Qwen-Image at 20B, FLUX.2 [dev] at 32B, and HunyuanImage 3.0, an 80B mixture-of-experts model. Beating something nine times your size on a single axis is a real result, though "best text rendering of any open-weight release we benchmarked" is doing some quiet work with that last clause.

Text in images has always been Ideogram's thing. Posters, signage, menus, logos, the stuff where most diffusion models still produce garbled letterforms. So leading there is less a surprise than a continuation.

What's under the hood

The architecture is where it gets genuinely interesting for the people who care about that sort of thing. Ideogram 4 is a flow-matching model on a single-stream DiT, meaning text and image tokens get concatenated into one sequence and run through the same 34-layer transformer. No separate branches.

The bigger swing is the text encoder. Instead of CLIP or T5, Ideogram bolted on Qwen3-VL-8B-Instruct, a full vision-language model, and pulls hidden states from 13 intermediate layers. The technical writeup frames the encoder as frozen and reused, so the 9.3B is just the trainable DiT. Worth keeping in mind when you compare that number to anyone else's: the actual thing sitting on your GPU is bigger than the marketing figure suggests.

JSON or bust

The model was trained almost entirely on structured JSON captions, not plain English. You can still type a normal sentence, but the docs are blunt that you'll get worse results, because the model only ever saw exhaustive JSON during training. There's a hosted "magic prompt" expander that turns a casual line into the full structured caption, and the system prompt for it is open. For people who actually want the control, the JSON surface exposes bounding-box layout in normalized coordinates, per-element color palettes via hex codes, and aspect ratios stretching to 6:1.

ComfyUI had day-zero support, per its launch post, which tells you the team coordinated the drop rather than tossing weights over the wall.

The benchmark everyone will quote

A blind typography review run by ContraLabs, judged by ten professional designers, put Ideogram 4 first 47.9% of the time. Next was Gemini 3.1 Flash Image Preview at 30.0%, then FLUX.2 [max] at 15.5% and Grok Imagine 1.0 at 15.0%. Asked whether they'd actually use the output in client work, the designers scored it 3.55 out of 5, ahead of the rest.

Ten judges is a small panel, and a vendor pointing you at a third party that ranked the vendor first is a setup to read with one eyebrow up. Still, blind beats not-blind, and typography is the one category where Ideogram has earned the benefit of the doubt.

The catch

It's not open source in the way the headlines imply. The weights ship under an Ideogram 4 Non-Commercial license, so you can download, fine-tune, and run locally, but you can't build a paying product on top without sorting out rights first. The weights are also gated on Hugging Face, requiring you to accept the license and authenticate before the download works.

Then there's the safety layer. Prompt and output screening route through Hive, requiring separate moderation keys. Reaction online split predictably: praise for the blind-test numbers and local access, grumbling about the filters and the license. The code being public means the moderation hooks are right there for anyone determined to strip them, which is its own conversation.

Both quantizations, nf4 for CUDA and fp8 for broader hardware, are live now, with more promised. If you don't want to run it yourself, it's on Ideogram's site and the usual aggregators.

Tags:Ideogramopen weightstext-to-imagediffusion modelsimage generationHugging FaceComfyUIAI art
Liza Chan

Liza Chan

AI & Emerging Tech Correspondent

Liza covers the rapidly evolving world of artificial intelligence, from breakthroughs in research labs to real-world applications reshaping industries. With a background in computer science and journalism, she translates complex technical developments into accessible insights for curious readers.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Ideogram 4: First Open-Weight Image Model Released | aiHola