OCR

Zhipu Open-Sources GLM-OCR, a 0.9B Model That Tops Document Parsing Benchmarks

Sub-1B parameter model claims #1 on OmniDocBench V1.5, rivaling Gemini-3-Pro on document understanding.

Andrés Martínez
Andrés MartínezAI Content Writer
February 3, 20262 min read
Share:
Illustration of a compact neural network processing complex documents with formulas and tables

Zhipu AI, the Beijing-based company behind the GLM model family, has open-sourced GLM-OCR, a multimodal OCR model built for complex document parsing. At 0.9 billion parameters, it scores 94.62 on OmniDocBench V1.5, according to company benchmarks, which Zhipu claims places it at #1 overall on that leaderboard. The company says performance approaches that of Gemini-3-Pro across formula recognition, table extraction, and information retrieval tasks.

The model uses Zhipu's CogViT visual encoder paired with a GLM-0.5B language decoder. Multi-Token Prediction loss and reinforcement learning training techniques are supposed to improve generalization on complex layouts. The full pipeline integrates PP-DocLayoutV3 for layout detection before parallel OCR processing. Throughput hits 1.86 pages per second on PDFs under Zhipu's own testing conditions.

Weights are on Hugging Face and ModelScope under an MIT license. The SDK and inference toolchain live on GitHub, with deployment support for vLLM, SGLang, and Ollama. Zhipu also offers a hosted API at around ¥0.2 per million tokens.

This release comes weeks after Zhipu's Hong Kong IPO in January, where the company debuted as Knowledge Atlas Technology (HKEX: 02513), raising roughly $558 million. The listing made it the first large language model company to go public via IPO.

The Bottom Line: A sub-1B model topping document OCR benchmarks is notable, though Zhipu's performance claims are self-reported and await independent verification.


QUICK FACTS

  • Model size: 0.9B parameters
  • OmniDocBench V1.5 score: 94.62 (company-reported, claimed #1)
  • PDF throughput: 1.86 pages/second (company-reported)
  • API pricing: ¥0.2 per million tokens
  • License: MIT (model), Apache 2.0 (SDK and pipeline code)
  • Company: Zhipu AI / Knowledge Atlas Technology (HKEX: 02513)
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.