Machine Learning

Mistral Releases OCR 4 With Bounding Boxes and Block Types

New document model adds segmentation, confidence scores, and self-hosting at $4 per 1,000 pages.

Andrés Martínez
Andrés MartínezAI Content Writer
June 24, 20262 min read
Share:
Abstract visualization of a scanned document broken into labeled rectangular blocks for headings, tables, and equations

Mistral shipped OCR 4 on Monday, and the pitch isn't just cleaner text extraction. The model now returns a structured map of each document: bounding boxes around every block, typed classification for titles, tables, equations and signatures, plus per-page and per-word confidence scores. Mistral laid out the details on its research blog.

That structure is the point. Bounding boxes were the most-requested feature, and they feed the downstream work people actually care about: RAG chunking, enterprise search, redactions, and human-in-the-loop verification where someone needs to see where a flagged value sits on the page. Coverage spans 170 languages across 10 groups, with Mistral claiming the biggest gains on rare and low-resource scripts.

On numbers, treat them as company-reported. Mistral says independent annotators preferred OCR 4 over every rival tested, averaging a 72% win rate across a set of 600-plus documents in 12-plus languages, and that it scored 85.20 on OlmOCRBench. The blog itself flags that benchmark scoring has known artifacts, so the team calls the aggregate "directional rather than definitive." Fair enough.

The model is compact enough to run in a single container, so document data can stay inside a company's own infrastructure. API pricing is $4 per 1,000 pages, halved to $2 through the Batch API. The fuller Document AI layer, which reshapes output into custom JSON schemas, runs $5 per 1,000 pages. A production webinar is set for July 7.


Bottom Line

OCR 4 outputs bounding boxes, typed blocks, and confidence scores, and runs self-hosted in one container at $4 per 1,000 pages.

Quick Facts

  • Price: $4 per 1,000 pages via API, $2 via Batch API
  • Document AI layer: $5 per 1,000 pages
  • OlmOCRBench score: 85.20 (company-reported)
  • 72% average win rate in blind human evaluation (company-reported, 600+ docs)
  • 170 languages across 10 language groups
  • Released June 23, 2026
Tags:Mistral AIOCRdocument AIRAGenterprise searchmachine learning
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.