AI Models Platforms

Xiaomi Open-Sources OneVL Driving Model With Latent Reasoning

The 4B model squeezes chain-of-thought reasoning into 55 latent tokens, beating 8B baselines.

Andrés Martínez
Andrés MartínezAI Content Writer
May 14, 20262 min read
Share:
Editorial illustration of an autonomous vehicle dashboard with abstract neural network pathways overlaid on a city street viewed through the windshield

Xiaomi's Embodied Intelligence team released OneVL this week, open-sourcing a 4B vision-language-action model for autonomous driving trajectory prediction. The team posted the project page alongside code on GitHub.

OneVL is built on Qwen3-VL-4B-Instruct. It squeezes chain-of-thought reasoning into 55 latent tokens (35 visual, 20 language) and uses dual auxiliary decoders during training: one for language CoT, one as a visual world model that predicts future frames. At inference the decoders get dropped, and the latent tokens prefill in a single parallel pass.

Xiaomi reports 88.84 PDM-score on NAVSIM, ahead of 8B baselines AdaThinkDrive (86.20) and LaST-VLA (87.30). The team calls OneVL the first latent CoT method to surpass explicit CoT on driving benchmarks. Those numbers are self-reported and haven't been independently replicated.

The latency claim deserves scrutiny. OneVL's prefill runs at 4.46 seconds on the test setup, roughly matching an answer-only baseline. The 0.24-second figure that has circulated belongs to a separate MLP variant, which trades accuracy (86.83 PDM-score) for real-time speed.

The technical paper is up on arXiv. Xiaomi says it plans to fully open-source weights and codebase for outside researchers to build on.


Bottom Line

OneVL claims 88.84 PDM-score on NAVSIM with a 4B model, but the headline 0.24s latency is from a stripped-down MLP variant, not the main system.

Quick Facts

  • Model size: 4B parameters, built on Qwen3-VL-4B-Instruct
  • Latent tokens: 55 total (35 visual, 20 language)
  • NAVSIM PDM-score: 88.84 (company-reported)
  • Main model latency: 4.46s prefill
  • MLP variant latency: 0.24s at 86.83 PDM-score
Tags:Xiaomiautonomous drivingVLA modelsopen sourceQwen3-VLlatent reasoning
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Xiaomi Open-Sources 4B OneVL Autonomous Driving Model | aiHola