Zhipu Open-Sources GLM-OCR, a 0.9B Model That Tops Document Parsing Benchmarks

Illustration of a compact neural network processing complex documents with formulas and tables

Zhipu AI, the Beijing-based company behind the GLM model family, has open-sourced GLM-OCR, a multimodal OCR model built for complex document parsing. At 0.9 billion parameters, it scores 94.62 on OmniDocBench V1.5, according to company benchmarks, which Zhipu claims places it at #1 overall on that leaderboard. The company says performance approaches that of Gemini-3-Pro across formula recognition, table extraction, and information retrieval tasks.

The model uses Zhipu's CogViT visual encoder paired with a GLM-0.5B language decoder. Multi-Token Prediction loss and reinforcement learning training techniques are supposed to improve generalization on complex layouts. The full pipeline integrates PP-DocLayoutV3 for layout detection before parallel OCR processing. Throughput hits 1.86 pages per second on PDFs under Zhipu's own testing conditions.

Weights are on Hugging Face and ModelScope under an MIT license. The SDK and inference toolchain live on GitHub, with deployment support for vLLM, SGLang, and Ollama. Zhipu also offers a hosted API at around ¥0.2 per million tokens.

This release comes weeks after Zhipu's Hong Kong IPO in January, where the company debuted as Knowledge Atlas Technology (HKEX: 02513), raising roughly $558 million. The listing made it the first large language model company to go public via IPO.

The Bottom Line: A sub-1B model topping document OCR benchmarks is notable, though Zhipu's performance claims are self-reported and await independent verification.

QUICK FACTS

Model size: 0.9B parameters
OmniDocBench V1.5 score: 94.62 (company-reported, claimed #1)
PDF throughput: 1.86 pages/second (company-reported)
API pricing: ¥0.2 per million tokens
License: MIT (model), Apache 2.0 (SDK and pipeline code)
Company: Zhipu AI / Knowledge Atlas Technology (HKEX: 02513)

Tags:GLM-OCR Zhipu AI OCR document parsing open source OmniDocBench computer vision

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Zhipu Open-Sources GLM-OCR, a 0.9B Model That Tops Document Parsing Benchmarks

QUICK FACTS

Andrés Martínez

Related Articles

Baidu's Tiny OCR Model Just Embarrassed the Industry Giants

Perplexity Upgrades Deep Research, Open-Sources DRACO Benchmark

DeepSeek Open-Sources OCR-2 with Human-Like Reading Architecture

Stay Ahead of the AI Curve