NVIDIA Lyra 2.0: Open-Source 3D World Generation Framework

Abstract 3D environment generated from a single photograph, showing navigable architecture with Gaussian splat rendering

NVIDIA's Spatial Intelligence Lab released Lyra 2.0, a framework that converts a single photograph into a navigable, geometrically consistent 3D world. The research paper dropped April 14, with model weights and source code available under Apache 2.0. Commercial use is permitted.

The core problem: AI video models lose track of what a scene looks like over long camera trajectories. Walk far enough, turn around, and the room you started in has warped beyond recognition. Two specific failure modes cause this. Spatial forgetting, where previously seen regions fall outside the model's context window and get hallucinated on revisit. And temporal drifting, where small errors compound frame over frame until the whole scene degrades. Lyra 2.0 tackles the first by caching per-frame 3D geometry to retrieve past views and establish spatial correspondences. For drift, it trains on its own degraded outputs, learning to correct accumulated errors rather than amplify them.

Built on Wan 2.1-14B (a diffusion transformer for video generation), the system produces frames at 832x480 resolution in 35 denoising steps. A distilled variant cuts that to 4 steps. Outputs come as 3D Gaussian Splats and surface meshes that plug directly into real-time rendering engines. NVIDIA's project page shows exports into Isaac Sim for robotics training, where a delivery robot can train in a simulated facility built from a single photo of the real space.

On DL3DV and Tanks and Temples benchmarks, NVIDIA reports an LPIPS of 0.552, FID of 51.33, and style consistency of 85.07%. These are company-reported numbers with no independent verification yet. Lyra 1.0 shipped in September 2025 with the core single-image-to-3D pipeline; this release focuses on scaling to larger, more complex environments. No managed inference endpoint exists at launch. You'll need your own GPU hardware, and NVIDIA says it's only been tested on H100 and A100 cards.

Bottom Line

Lyra 2.0 ships under Apache 2.0 with weights on Hugging Face, but requires H100/A100 GPUs with at least 43GB VRAM to run locally.

Quick Facts

License: Apache 2.0, commercial use permitted
Base model: Wan 2.1-14B diffusion transformer
Resolution: 832x480, 35 denoising steps (4 steps distilled)
Benchmarks (company-reported): LPIPS 0.552, FID 51.33, style consistency 85.07%
Hardware: tested on H100/A100 only, ~43GB VRAM with full offloading

Tags:NVIDIA3D generationLyra 2.0open sourcerobotics simulationGaussian splattingcomputer vision

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

NVIDIA Open-Sources Lyra 2.0 for 3D World Generation from Single Images

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

Reve 2.0 Launches, Hits No. 2 on Image Arena

Utopai Studios Launches PAI 2.0 Video Platform

Ideogram Releases Ideogram 4, Its First Open-Weight Image Model

Stay Ahead of the AI Curve