Tencent Open-Sources HY-WorldPlay Real-Time World Model

Abstract visualization of an AI-generated 3D environment with streaming video frames forming an explorable virtual landscape

Tencent's Hunyuan AI team has open-sourced HY-WorldPlay, a streaming video diffusion model that turns a single image or text prompt into an interactive, explorable environment running at 24 frames per second. The system, part of the broader Hunyuan 3D ecosystem, shipped its initial release on December 17, 2025, with a major follow-up on January 6, 2026, adding open training code, a lighter 5B-parameter model, and a waitlist-free online demo.

A quick clarification on what this actually is: WorldPlay doesn't build traditional 3D meshes. It's a video diffusion model that predicts the next 16 frames of video based on your keyboard and mouse input, creating the illusion of navigating a 3D space. The trick is consistency. Leave an area and come back, and the geometry holds. That's the hard part, and it's where most competing approaches fall apart. The research paper credits four techniques: a dual action representation for handling both keyboard and camera-pose inputs, a reconstituted context memory that keeps old frames accessible, an RL post-training step called WorldCompass, and a distillation method called Context Forcing that maintains long-range coherence at real-time speeds.

Two model variants are available on Hugging Face: an 8B version built on HunyuanVideo 1.5 (recommended, better action control) and a lighter 5B version based on WAN that runs on consumer GPUs but with reduced quality. The full training pipeline is now open, trained on 320,000 real and synthetic videos. All benchmarks in the paper are self-reported, and independent testing hasn't surfaced yet. The 24 FPS figure requires multi-GPU inference across 8 GPUs with sequence parallelism, so don't expect that on a single card.

Bottom Line

WorldPlay ships two open-source model variants (8B and 5B) with full training code, though the 24 FPS claim requires 8-GPU inference.

Quick Facts

24 FPS streaming video generation at 720p (company-reported, 8-GPU setup)
Two models: WorldPlay-8B (HunyuanVideo) and WorldPlay-5B lite (WAN)
155 GB total model weights on Hugging Face
Training dataset: 320,000 real and synthetic videos
Initial release: December 17, 2025; training code released January 6, 2026

Tags:Tencentworld models3D generationopen sourcevideo diffusionHunyuanreal-time AI

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Tencent Open-Sources Real-Time Interactive World Model HY-WorldPlay

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

Open-Weight LLMs in 2026 Reshape Attention to Cut Long-Context Costs

Elastic Releases Jina v5 Omni Multimodal Embedding Models

Xiaomi Open-Sources OneVL Driving Model With Latent Reasoning

Stay Ahead of the AI Curve