Tencent Open-Sources HY-Embodied-0.5 for Physical AI Agents

Robotic arm with visual sensors processing spatial environment data

Tencent's Robotics X and Hunyuan Vision teams dropped HY-Embodied-0.5 this week, a pair of vision-language models built specifically for agents that operate in the physical world. The 2B-parameter variant is open-sourced on Hugging Face with inference code. A larger 32B model handles complex reasoning but isn't publicly available yet.

The pitch: general-purpose VLMs aren't great at the stuff embodied agents actually need, like spatial perception, depth understanding, and action planning. HY-Embodied tries to close that gap. The smaller model uses a Mixture-of-Transformers architecture with 4B total parameters but only 2.2B activated during inference, so it runs at dense-2B speeds while pulling from a bigger parameter pool. Latent tokens compress visual information into tighter representations for finer perception.

Tencent claims the MoT-2B outperforms similarly sized models across 16 benchmarks, though these are company-reported numbers. The 32B variant reportedly matches Gemini 3.0 Pro on embodied tasks. Both claims lack independent verification so far. A technical paper details the training pipeline, which includes a self-evolving post-training loop and on-policy distillation from the large model to the small one.

The real test is downstream. Tencent says HY-Embodied already works as a foundation for Vision-Language-Action pipelines, with results in physical robot control experiments. No pricing or API access announced. The 2B weights are available now.

Bottom Line

Tencent's open-sourced 2B embodied model activates only 2.2B of its 4B parameters during inference, targeting edge robotics deployment.

Quick Facts

MoT-2B: 4B total parameters, 2.2B activated
32B variant: 407B total parameters, 32B activated (not yet open-sourced)
Outperforms peers on 16 benchmarks (company-reported)
32B performance comparable to Gemini 3.0 Pro (company-reported)
Released April 9, 2026

Tags:Tencentembodied AIroboticsopen-source modelsvision-language modelsHunyuanedge AI

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Tencent Open-Sources Embodied AI Model for Physical Agents

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

DeepSeek V4 Nears Launch on Huawei Chips After Months of Delays

DMax Turns Diffusion Language Model Decoding Into a Self-Correction Loop

OpenAI Adds $100 ChatGPT Pro Tier Targeting Codex Users

Stay Ahead of the AI Curve