Z.ai Open-Sources GLM-4.6V Multimodal Models with Tool Use

Illustration of GLM-4.6V multimodal AI connecting vision to tool execution

Z.ai dropped GLM-4.6V today, an open-source multimodal model series that bakes function calling directly into vision processing. The release includes a 106B parameter flagship model for cloud deployment and a 9B Flash variant built for local inference.

The core pitch: images and documents go straight into tool parameters without conversion to text descriptions first. When a tool returns a chart or screenshot, the model can read it and feed that visual understanding into the next step. Z.ai calls this closing the loop from perception to execution. Practical applications include generating image-text articles from uploaded slides, running visual web searches, and converting screenshots to functional HTML/CSS code.

GLM-4.6V trained on 128K token context windows, letting it process roughly 150 pages of complex documents or an hour of video in one pass. On standard multimodal benchmarks (MMBench, MathVista, OCRBench), it claims state-of-the-art among open-source models at similar scale.

Weights are live on Hugging Face and ModelScope. The model supports vLLM and SGLang for high-throughput inference. Developers can also access it through Z.ai's OpenAI-compatible API.

The Bottom Line: GLM-4.6V is the first major open-source multimodal model with native function calling baked in, available now in 106B and 9B parameter versions.

QUICK FACTS

GLM-4.6V: 106B parameters (cloud); GLM-4.6V-Flash: 9B parameters (local)
Context window: 128K tokens
Handles ~150 document pages or 1 hour of video per inference
Models released on Hugging Face and ModelScope
Available through Z.ai API and chat.z.ai interface

Tags:GLM-4.6VZ.aimultimodal AIopen sourcefunction callingvision language modelZhipu

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Z.ai Open-Sources GLM-4.6V with Native Tool Calling

QUICK FACTS

Andrés Martínez

Related Articles

StepFun Releases Open-Weight Step 3.7 Flash for Agentic Work

Xiaomi Cuts MiMo-V2.5 API Prices by Up to 99%

PrismML Shrinks a 4B Image Model to Run on iPhones

Stay Ahead of the AI Curve