Z.ai, the company formerly known as Zhipu AI, released GLM-5V-Turbo this week: a multimodal model that processes images, video, and text to generate working code. The pitch is straightforward. Feed it a design mockup, get a runnable frontend project back. Available now via API and a free web interface at chat.z.ai, the model is priced at $1.20 per million input tokens and $4 per million output tokens through OpenRouter.
The company-reported benchmarks are eye-catching. On Design2Code, which measures how well models reproduce UI mockups as code, Z.ai claims a score of 94.8 against Claude Opus 4.6's 77.3. It also posts strong numbers on GUI agent benchmarks like AndroidWorld and WebVoyager. These are all self-reported figures, and independent testing hasn't confirmed them. Pure text coding tells a different story: Claude still leads on backend tasks and repo exploration in CC-Bench-V2, per third-party analysis.
Under the hood, the model uses a new CogViT vision encoder and multi-token prediction architecture, with a 200K context window and 128K max output. Z.ai trained it with joint reinforcement learning across 30-plus task types to prevent the usual tradeoff where visual gains erode coding ability. The model integrates with Claude Code and Z.ai's own OpenClaw agent framework, per the MarkTechPost coverage.
So it's a specialist. If your workflow involves turning screenshots into HTML/CSS or debugging UI rendering issues from images, GLM-5V-Turbo looks competitive. For general-purpose coding, it's not displacing anything yet. Z.ai is also accepting trial applications for its Coding Plan.
Bottom Line
GLM-5V-Turbo claims a Design2Code score of 94.8 versus Claude Opus 4.6's 77.3, but those are self-reported numbers and the model trails Claude on pure text coding benchmarks.
Quick Facts
- Design2Code score: 94.8 (company-reported)
- API pricing: $1.20/M input tokens, $4/M output tokens
- Context window: 200K tokens, max output 128K tokens
- Architecture: CogViT vision encoder, MTP inference
- RL training across 30+ task types




