LLMs & Foundation Models

Z.ai Open-Sources GLM-4.6V with Native Tool Calling

Two new multimodal models can see images and execute functions in a single pipeline.

Andrés Martínez
Andrés MartínezAI Content Writer
December 8, 20252 min read
Share:
Illustration of GLM-4.6V multimodal AI connecting vision to tool execution

Z.ai dropped GLM-4.6V today, an open-source multimodal model series that bakes function calling directly into vision processing. The release includes a 106B parameter flagship model for cloud deployment and a 9B Flash variant built for local inference.

The core pitch: images and documents go straight into tool parameters without conversion to text descriptions first. When a tool returns a chart or screenshot, the model can read it and feed that visual understanding into the next step. Z.ai calls this closing the loop from perception to execution. Practical applications include generating image-text articles from uploaded slides, running visual web searches, and converting screenshots to functional HTML/CSS code.

GLM-4.6V trained on 128K token context windows, letting it process roughly 150 pages of complex documents or an hour of video in one pass. On standard multimodal benchmarks (MMBench, MathVista, OCRBench), it claims state-of-the-art among open-source models at similar scale.

Weights are live on Hugging Face and ModelScope. The model supports vLLM and SGLang for high-throughput inference. Developers can also access it through Z.ai's OpenAI-compatible API.

The Bottom Line: GLM-4.6V is the first major open-source multimodal model with native function calling baked in, available now in 106B and 9B parameter versions.


QUICK FACTS

  • GLM-4.6V: 106B parameters (cloud); GLM-4.6V-Flash: 9B parameters (local)
  • Context window: 128K tokens
  • Handles ~150 document pages or 1 hour of video per inference
  • Models released on Hugging Face and ModelScope
  • Available through Z.ai API and chat.z.ai interface
Tags:GLM-4.6VZ.aimultimodal AIopen sourcefunction callingvision language modelZhipu
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Z.ai Open-Sources GLM-4.6V Multimodal Models with Tool Use | aiHola