Z.ai GLM-5V-Turbo: Vision-to-Code AI Model Launches

Abstract visualization of a design mockup being transformed into lines of code through an AI model pipeline

Z.ai, the company formerly known as Zhipu AI, released GLM-5V-Turbo this week: a multimodal model that processes images, video, and text to generate working code. The pitch is straightforward. Feed it a design mockup, get a runnable frontend project back. Available now via API and a free web interface at chat.z.ai, the model is priced at $1.20 per million input tokens and $4 per million output tokens through OpenRouter.

The company-reported benchmarks are eye-catching. On Design2Code, which measures how well models reproduce UI mockups as code, Z.ai claims a score of 94.8 against Claude Opus 4.6's 77.3. It also posts strong numbers on GUI agent benchmarks like AndroidWorld and WebVoyager. These are all self-reported figures, and independent testing hasn't confirmed them. Pure text coding tells a different story: Claude still leads on backend tasks and repo exploration in CC-Bench-V2, per third-party analysis.

Under the hood, the model uses a new CogViT vision encoder and multi-token prediction architecture, with a 200K context window and 128K max output. Z.ai trained it with joint reinforcement learning across 30-plus task types to prevent the usual tradeoff where visual gains erode coding ability. The model integrates with Claude Code and Z.ai's own OpenClaw agent framework, per the MarkTechPost coverage.

So it's a specialist. If your workflow involves turning screenshots into HTML/CSS or debugging UI rendering issues from images, GLM-5V-Turbo looks competitive. For general-purpose coding, it's not displacing anything yet. Z.ai is also accepting trial applications for its Coding Plan.

Bottom Line

GLM-5V-Turbo claims a Design2Code score of 94.8 versus Claude Opus 4.6's 77.3, but those are self-reported numbers and the model trails Claude on pure text coding benchmarks.

Quick Facts

Design2Code score: 94.8 (company-reported)
API pricing: $1.20/M input tokens, $4/M output tokens
Context window: 200K tokens, max output 128K tokens
Architecture: CogViT vision encoder, MTP inference
RL training across 30+ task types

Tags:Z.aiGLM-5V-Turbomultimodal AIdesign-to-codevision language modelZhipu AIAI coding

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Z.ai Launches GLM-5V-Turbo, a Vision-to-Code AI Model

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

Alibaba Launches Qwen3.6-Plus With Agentic Coding and 1M Context

Google DeepMind Demos a Browser That Generates Pages Instead of Loading Them

OpenAI Adds Plugin System to Codex, Five Months After Anthropic Did the Same

Stay Ahead of the AI Curve