A Palo Alto startup you may not have heard of just landed at number two on the text-to-image leaderboard. Reve AI shipped Reve 2.0, and on the Arena board dated June 3 it scored 1280 (plus or minus 11) from 3,455 votes, sitting behind OpenAI's GPT Image 2 and just ahead of Google's Gemini 3.1 Flash Image Preview, the model better known as Nano Banana 2.
The interesting part isn't the ranking. Reve's pitch is that the model decomposes images into what it calls a layout representation, a code-like internal structure where every element carries a location, size, and description. You build the composition first, then render in native 4K. Edit an object and the rest of the scene survives intact.
Reve frames this as image generation moving toward program synthesis, which opens the door to version control and diff-based edits that freeform prompts can't really support. The company also claims it's the best image model from any sub-$1 trillion company and that it trained on 10x fewer GPUs. That GPU figure is self-reported and hasn't been independently checked.
One reported benchmark: CLIP similarity climbing from 0.865 with no layout regions to 0.929 with 50. Some users on launch day hit signup errors and quota issues, so the rollout isn't perfectly smooth.
It's free to try with daily limits at app.reve.com. An API is in beta.
Bottom Line
Reve 2.0 scored 1280 on the June 3 Arena leaderboard, second only to GPT Image 2 and ahead of Nano Banana 2.
Quick Facts
- Arena score: 1280 +/- 11, ranked #2 (June 3, 2026)
- Vote count: 3,455
- Behind GPT Image 2, ahead of Gemini 3.1 Flash Image (Nano Banana 2)
- Native 4K output via layout representation
- Company-reported: trained on 10x fewer GPUs (unverified)
- CLIP similarity 0.865 to 0.929 across 0 to 50 layout regions (company-reported)




