SkyReels-V4: Skywork's Unified Video-Audio Generation Model

Abstract visualization of dual parallel data streams merging video frames and audio waveforms

Skywork AI, the research arm of Chinese gaming company Kunlun Tech, dropped the technical report for SkyReels-V4 this week. The model unifies video generation, inpainting, and editing with synchronized audio output under one architecture. That's a tall claim, but the team's track record of open-sourcing previous SkyReels versions (V1 through V3 all shipped weights on Hugging Face) lends it some credibility.

The core idea: a dual-stream Multimodal Diffusion Transformer where one branch handles video and the other handles audio. Both share a text encoder built on a multimodal large language model, which lets the system accept text, images, video clips, masks, and audio references as input. The video branch uses channel concatenation to treat image-to-video, video extension, and editing as variants of inpainting. Clever, if it holds up in practice.

On paper, SkyReels-V4 outputs 1080p video at 32 FPS for up to 15 seconds with temporally aligned audio. To keep that computationally feasible, the team generates low-resolution full sequences alongside high-resolution keyframes, then runs super-resolution and frame interpolation. The report claims third place on the Artificial Analysis Text-to-Video with Audio Arena as of February 24, behind Veo 3.1 and Grok's video model, though arena rankings shift quickly.

No release date for weights or code. Skywork has consistently published its previous models as open-source, but V4 remains report-only for now.

Bottom Line

SkyReels-V4 claims third place on the Artificial Analysis video-audio arena, but weights and code haven't been released yet.

Quick Facts

Resolution: up to 1080p at 32 FPS, 15 seconds max
Architecture: dual-stream MMDiT with shared MMLM text encoder
Arena rank: 3rd on Artificial Analysis Text-to-Video with Audio (company-reported, as of Feb 24)
Developer: Skywork AI (Kunlun Tech)
No model weights or code released yet

Tags:SkyReelsSkywork AIvideo generationaudio generationdiffusion transformerKunlun Techopen source AI

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Skywork Publishes SkyReels-V4 Technical Report for Unified Video-Audio Model

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

NVIDIA Releases Nemotron 3 Nano Omni Multimodal Model

Academy Bars AI Actors and AI Screenplays From 2027 Oscars

Uber Burns Through Its 2026 AI Budget in Four Months on Claude Code

Stay Ahead of the AI Curve