Mistral OCR 4 Adds Bounding Boxes, Self-Hosting

Abstract visualization of a scanned document broken into labeled rectangular blocks for headings, tables, and equations

Mistral shipped OCR 4 on Monday, and the pitch isn't just cleaner text extraction. The model now returns a structured map of each document: bounding boxes around every block, typed classification for titles, tables, equations and signatures, plus per-page and per-word confidence scores. Mistral laid out the details on its research blog.

That structure is the point. Bounding boxes were the most-requested feature, and they feed the downstream work people actually care about: RAG chunking, enterprise search, redactions, and human-in-the-loop verification where someone needs to see where a flagged value sits on the page. Coverage spans 170 languages across 10 groups, with Mistral claiming the biggest gains on rare and low-resource scripts.

On numbers, treat them as company-reported. Mistral says independent annotators preferred OCR 4 over every rival tested, averaging a 72% win rate across a set of 600-plus documents in 12-plus languages, and that it scored 85.20 on OlmOCRBench. The blog itself flags that benchmark scoring has known artifacts, so the team calls the aggregate "directional rather than definitive." Fair enough.

The model is compact enough to run in a single container, so document data can stay inside a company's own infrastructure. API pricing is $4 per 1,000 pages, halved to $2 through the Batch API. The fuller Document AI layer, which reshapes output into custom JSON schemas, runs $5 per 1,000 pages. A production webinar is set for July 7.

Bottom Line

OCR 4 outputs bounding boxes, typed blocks, and confidence scores, and runs self-hosted in one container at $4 per 1,000 pages.

Quick Facts

Price: $4 per 1,000 pages via API, $2 via Batch API
Document AI layer: $5 per 1,000 pages
OlmOCRBench score: 85.20 (company-reported)
72% average win rate in blind human evaluation (company-reported, 600+ docs)
170 languages across 10 language groups
Released June 23, 2026

Tags:Mistral AIOCRdocument AIRAGenterprise searchmachine learning

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Mistral Releases OCR 4 With Bounding Boxes and Block Types

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

Ai2's MolmoMotion Forecasts How Objects Will Move in 3D Space Before They Move

New Rust System Hits 96% of cuBLAS Speed for GPU Kernels

OpenAI Builds Method to Predict AI Misbehavior Before Models Ship

Stay Ahead of the AI Curve