Xiaomi Open-Sources 4B OneVL Autonomous Driving Model

Editorial illustration of an autonomous vehicle dashboard with abstract neural network pathways overlaid on a city street viewed through the windshield

Xiaomi's Embodied Intelligence team released OneVL this week, open-sourcing a 4B vision-language-action model for autonomous driving trajectory prediction. The team posted the project page alongside code on GitHub.

OneVL is built on Qwen3-VL-4B-Instruct. It squeezes chain-of-thought reasoning into 55 latent tokens (35 visual, 20 language) and uses dual auxiliary decoders during training: one for language CoT, one as a visual world model that predicts future frames. At inference the decoders get dropped, and the latent tokens prefill in a single parallel pass.

Xiaomi reports 88.84 PDM-score on NAVSIM, ahead of 8B baselines AdaThinkDrive (86.20) and LaST-VLA (87.30). The team calls OneVL the first latent CoT method to surpass explicit CoT on driving benchmarks. Those numbers are self-reported and haven't been independently replicated.

The latency claim deserves scrutiny. OneVL's prefill runs at 4.46 seconds on the test setup, roughly matching an answer-only baseline. The 0.24-second figure that has circulated belongs to a separate MLP variant, which trades accuracy (86.83 PDM-score) for real-time speed.

The technical paper is up on arXiv. Xiaomi says it plans to fully open-source weights and codebase for outside researchers to build on.

Bottom Line

OneVL claims 88.84 PDM-score on NAVSIM with a 4B model, but the headline 0.24s latency is from a stripped-down MLP variant, not the main system.

Quick Facts

Model size: 4B parameters, built on Qwen3-VL-4B-Instruct
Latent tokens: 55 total (35 visual, 20 language)
NAVSIM PDM-score: 88.84 (company-reported)
Main model latency: 4.46s prefill
MLP variant latency: 0.24s at 86.83 PDM-score

Tags:Xiaomiautonomous drivingVLA modelsopen sourceQwen3-VLlatent reasoning

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Xiaomi Open-Sources OneVL Driving Model With Latent Reasoning

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

New Chronicles-OCR benchmark catches frontier vision models scoring near zero on ancient Chinese scripts

Open-Weight LLMs in 2026 Reshape Attention to Cut Long-Context Costs

Elastic Releases Jina v5 Omni Multimodal Embedding Models

Stay Ahead of the AI Curve