Zhipu AI, the Beijing-based company behind the GLM model family, has open-sourced GLM-OCR, a multimodal OCR model built for complex document parsing. At 0.9 billion parameters, it scores 94.62 on OmniDocBench V1.5, according to company benchmarks, which Zhipu claims places it at #1 overall on that leaderboard. The company says performance approaches that of Gemini-3-Pro across formula recognition, table extraction, and information retrieval tasks.
The model uses Zhipu's CogViT visual encoder paired with a GLM-0.5B language decoder. Multi-Token Prediction loss and reinforcement learning training techniques are supposed to improve generalization on complex layouts. The full pipeline integrates PP-DocLayoutV3 for layout detection before parallel OCR processing. Throughput hits 1.86 pages per second on PDFs under Zhipu's own testing conditions.
Weights are on Hugging Face and ModelScope under an MIT license. The SDK and inference toolchain live on GitHub, with deployment support for vLLM, SGLang, and Ollama. Zhipu also offers a hosted API at around ¥0.2 per million tokens.
This release comes weeks after Zhipu's Hong Kong IPO in January, where the company debuted as Knowledge Atlas Technology (HKEX: 02513), raising roughly $558 million. The listing made it the first large language model company to go public via IPO.
The Bottom Line: A sub-1B model topping document OCR benchmarks is notable, though Zhipu's performance claims are self-reported and await independent verification.
QUICK FACTS
- Model size: 0.9B parameters
- OmniDocBench V1.5 score: 94.62 (company-reported, claimed #1)
- PDF throughput: 1.86 pages/second (company-reported)
- API pricing: ¥0.2 per million tokens
- License: MIT (model), Apache 2.0 (SDK and pipeline code)
- Company: Zhipu AI / Knowledge Atlas Technology (HKEX: 02513)




