Baidu Officially Launches ERNIE 5.0, Its 2.4 Trillion Parameter Omni-Modal Model

Baidu on Thursday unveiled the official release of ERNIE 5.0, a natively omni-modal AI model with 2.4 trillion parameters that jointly processes text, images, audio, and video. The model now powers the company's ERNIE Bot consumer product and is available to enterprise customers through the Qianfan cloud platform.

The Numbers

The January 15 LMArena ranking tells the story Baidu wants you to hear: ERNIE-5.0-0110 scored 1,460 points, placing eighth globally and becoming the only Chinese model in the platform's top 10. It ranked ahead of OpenAI's GPT-5.1-High and Google's Gemini-2.5-Pro on text benchmarks. In mathematical reasoning, it landed second worldwide, behind only the unreleased GPT-5.2-High.

But timing matters here. While ERNIE 5.0 competes favorably against GPT-5.1 and Gemini 2.5 Pro on specific benchmarks, Western labs have already shipped GPT-5.2 and Gemini 3. Google DeepMind CEO Demis Hassabis said at Davos this week that Chinese AI models are roughly six months behind their American counterparts. The LMArena numbers suggest he's not far off.

The Sparse Activation Bet

ERNIE 5.0 runs on an ultra-sparse Mixture-of-Experts architecture, activating less than 3% of its 2.4 trillion parameters per inference. The approach resembles DeepSeek's strategy: pack enormous capacity into the model, then route each query through a tiny fraction of it to keep compute costs manageable.

Baidu CTO Haifeng Wang described the architecture as adopting "a unified auto-regression architecture for native full multimodal modelling." In plain terms: unlike models that bolt vision or audio modules onto a text backbone, ERNIE 5.0 trained on all modalities from the start. Whether that translates to better real-world performance or just better benchmarks remains an open question.

Early Feedback Is Mixed

The benchmark charts look competitive. Baidu claims ERNIE 5.0 matches or beats GPT-5-High and Gemini 2.5 Pro on document understanding tasks like OCRBench, DocVQA, and ChartQA. On image generation benchmarks, the company says it ties or exceeds Google's Veo3 on semantic alignment and quality metrics. Audio understanding results on MM-AU and TUT2017 round out the omni-modal story.

But early users have found problems. Developer Lisan al Gaib reported on X that ERNIE 5.0 repeatedly triggered tools when explicitly told not to during SVG generation tasks. "ERNIE 5.0 benchmarks looked insane until I tested it... unfortunately it's RL braindamaged or they have a serious issue with their chat platform / system prompt," they wrote. Baidu's developer support account acknowledged the bug and said a fix was coming.

This gap between benchmark performance and instruction following isn't unique to ERNIE 5.0. But for a model positioning itself as enterprise-ready, reliability matters more than leaderboard placement.

The Competitive Pressure

Baidu's AI consumer business has been bleeding users. ByteDance's Doubao commanded over 100 million monthly active users at one point. DeepSeek's cost-efficient models triggered a price war that forced Baidu to abandon its paid subscription model entirely last April. A Bloomberg Intelligence analysis from mid-2025 projected Baidu would continue losing market share to deep-pocketed rivals like Tencent and Alibaba.

The official ERNIE 5.0 launch coincided with Baidu announcing its AI assistant had reached 200 million monthly active users, a milestone that positions it back among China's AI leaders at least by user count. The company's stock jumped roughly 10% on the news.

But the enterprise side looks healthier than consumer. ERNIE now powers smart city command centers across China, serves all systemically important Chinese banks, and processes 16.5 billion API calls daily, according to company statements. This B2B stronghold has insulated Baidu from the consumer wars while funding continued development.

What's Missing

Baidu hasn't published a technical report for ERNIE 5.0. No model weights. No detailed documentation of the training process or architecture specifics beyond the sparse MoE descriptor. The company's most recent open release was ERNIE-4.5-VL-28B-A3B-Thinking, an Apache-licensed model that can manipulate images during reasoning.

This matters for anyone trying to verify Baidu's benchmark claims. The company says ERNIE 5.0 matches top models across "over 40 authoritative benchmark evaluations," but independent testing is impossible without access to the model outside Baidu's controlled environment.

Users can test ERNIE 5.0 for free at ernie.baidu.com. Enterprise access runs through Baidu AI Cloud's Qianfan platform. Whether the instruction-following issues get resolved before customers start building on top of it is the question that matters.