Mistral OCR: The Future of Document Understanding with AI

Today, we’re at the precipice of the next big leap—to unlock the collective intelligence of all digitized information. Approximately 90% of the world’s organizational data is stored as documents, and to harness this potential, we are introducing Mistral OCR. Mistral OCR is an Optical Character Recognition API that sets a new standard in document understanding. Unlike other models, Mistral OCR comprehends each element of documents—media, text, tables, equations—with unprecedented accuracy and cognition. It takes images and PDFs as input and extracts content in an ordered interleaved text and images.

As a result, Mistral OCR is an ideal model to use in combination with a RAG system taking multimodal documents (such as slides or complex PDFs) as input.

We have made Mistral OCR as the default model for document understanding across millions of users on Le Chat, and are releasing the API mistral-ocr-latest at 1000 pages / $ (and approximately double the pages per dollar with batch inference). The API is available today on our developer suite la Plateforme, and coming soon to our cloud and inference partners, as well as on-premises.

Key Features of Mistral OCR

State of the art understanding of complex documents
Natively multilingual and multimodal
Top-tier benchmarks
Fastest in its category
Doc-as-prompt, structured output
Selectively available to self-host for organizations dealing with highly sensitive or classified information

Understanding Document Components

Mistral OCR excels in understanding complex document elements, including interleaved imagery, mathematical expressions, tables, and advanced layouts such as LaTeX formatting. The model enables deeper understanding of rich documents such as scientific papers with charts, graphs, equations and figures.

Multilingual by Design

Since Mistral’s founding, we have aspired to serve the world with our models, and consequently strived for multilingual capabilities across our offerings. Mistral OCR takes this to a new level, being able to parse, understand, and transcribe thousands of scripts, fonts, and languages across all continents. This versatility is crucial for both global organizations that handle documents from diverse linguistic backgrounds, as well as hyperlocal businesses serving niche markets.

Mistral OCR is setting a new standard in document understanding. Unlike other models, it comprehends each element of documents—media, text, tables, equations—with unprecedented accuracy and cognition. It takes images and PDFs as input and extracts content in an ordered interleaved text and images.

As a result, Mistral OCR is an ideal model to use in combination with a RAG system taking multimodal documents (such as slides or complex PDFs) as input.

Understanding Complex Documents

Multilingual Prowess

Additional Advantages

Being lighter weight than most models in the category, Mistral OCR performs significantly faster than its peers, processing up to 2000 pages per minute on a single node. The ability to rapidly process documents ensures continuous learning and improvement even for high-throughput environments. Mistral OCR also introduces the use of documents as prompts, enabling more powerful and precise instructions. This capability allows users to extract specific information from documents and format it in structured outputs, such as JSON. Users can chain extracted outputs into downstream function calls and build agents. For organizations with stringent data privacy requirements, Mistral OCR offers a self-hosting option. This ensures that sensitive or classified information remains secure within your own infrastructure, providing compliance with regulatory and security standards. If you would like to explore self-deployment with us, please let us know.

Mistral OCR has consistently outperformed other leading OCR models in rigorous benchmark tests. Its superior accuracy across multiple aspects of document analysis is illustrated below. We extract embedded images from documents along with text. The other LLMs compared below, do not have that capability. For a fair comparison, we evaluate them on our internal “text-only” test-set containing various publication papers, and PDFs from the web.

Comparative OCR Performance

The following showcases Mistral OCR’s performance compared to industry leading models such as Google Document AI, Azure OCR and the Gemini families. The data reflects evaluations on a variety of document types to showcase performance in real world application.

Key Metrics and Accuracy

Our testing focused on key areas critical to OCR performance. These include overall accuracy, mathematical expression recognition, multilingual support, handling of scanned documents, and table extraction. The results clearly position Mistral OCR as a leader in the field.

Image Extraction Capabilities

A distinct advantage of Mistral OCR is its ability to extract embedded images from documents alongside text. This multimodal capability sets it apart from other models that are limited to text-only extraction, and is part of what gives it such high performance across benchmarks.

We are empowering our beta customers to elevate their organizational knowledge by transforming their extensive document repositories into actions and solutions. Some of the key use cases where our technology is making a significant impact include:

Accelerating Scientific Discovery

Leading research institutions have been experimenting with Mistral OCR to convert scientific papers and journals into AI-ready formats, making them accessible to downstream intelligence engines. This has facilitated measurably faster collaboration and accelerated scientific workflows.

Preserving Cultural Heritage

Organizations and nonprofits that are custodians of heritage have been using Mistral OCR to digitize historical documents and artifacts, ensuring their preservation and making them accessible to a broader audience.

Enhancing Customer Support

Customer service departments are exploring Mistral OCR to transform documentation and manuals into indexed knowledge, reducing response times and improving customer satisfaction.

Unlocking Knowledge Across Industries

Mistral OCR has also been helping companies convert technical literature, engineering drawings, lecture notes, presentations, regulatory filings and much more into indexed, answer-ready formats, unlocking intelligence and productivity across millions of documents.

Experience Mistral OCR Today: Try the API and Explore the Possibilities

Jump into Action with the API

Mistral OCR capabilities are free to try on le Chat. To try the API, head over to la Plateforme. We’d love to get your feedback; expect the model to continue to get even better in the weeks to come.

Custom Solutions for Your Organization

As part of our strategic engagement programs, we will also offer on-premises deployment on a selective basis. If you would like to explore self-deployment with us, please let us know.

Share Your Experience

We encourage you to share your experiences with Mistral OCR and let us know how it’s transforming your workflows. Your feedback is valuable as we continue to refine and improve our offering.

TECH N' STUFF