Tencent's Hunyuan AI team has open-sourced HY-WorldPlay, a streaming video diffusion model that turns a single image or text prompt into an interactive, explorable environment running at 24 frames per second. The system, part of the broader Hunyuan 3D ecosystem, shipped its initial release on December 17, 2025, with a major follow-up on January 6, 2026, adding open training code, a lighter 5B-parameter model, and a waitlist-free online demo.
A quick clarification on what this actually is: WorldPlay doesn't build traditional 3D meshes. It's a video diffusion model that predicts the next 16 frames of video based on your keyboard and mouse input, creating the illusion of navigating a 3D space. The trick is consistency. Leave an area and come back, and the geometry holds. That's the hard part, and it's where most competing approaches fall apart. The research paper credits four techniques: a dual action representation for handling both keyboard and camera-pose inputs, a reconstituted context memory that keeps old frames accessible, an RL post-training step called WorldCompass, and a distillation method called Context Forcing that maintains long-range coherence at real-time speeds.
Two model variants are available on Hugging Face: an 8B version built on HunyuanVideo 1.5 (recommended, better action control) and a lighter 5B version based on WAN that runs on consumer GPUs but with reduced quality. The full training pipeline is now open, trained on 320,000 real and synthetic videos. All benchmarks in the paper are self-reported, and independent testing hasn't surfaced yet. The 24 FPS figure requires multi-GPU inference across 8 GPUs with sequence parallelism, so don't expect that on a single card.
Bottom Line
WorldPlay ships two open-source model variants (8B and 5B) with full training code, though the 24 FPS claim requires 8-GPU inference.
Quick Facts
- 24 FPS streaming video generation at 720p (company-reported, 8-GPU setup)
- Two models: WorldPlay-8B (HunyuanVideo) and WorldPlay-5B lite (WAN)
- 155 GB total model weights on Hugging Face
- Training dataset: 320,000 real and synthetic videos
- Initial release: December 17, 2025; training code released January 6, 2026




