AI Models Platforms

Tencent Hunyuan Open-Sources UniRL Multimodal RL Framework

One RL post-training loop spanning image, video, VLM, LLM and unified models.

Andrés Martínez
Andrés MartínezAI Content Writer
June 11, 20262 min read
Share:
Abstract visualization of a unified reinforcement learning loop connecting image, video, and language model nodes

Tencent's Hunyuan team has open-sourced UniRL, a reinforcement learning framework that runs a single post-training loop across very different model types. The GitHub repo describes the loop the usual way: generate samples, score them, compute advantages, update the policy, then sync weights back to rollout workers.

The pitch is breadth. Most RL stacks lock to one modality. UniRL applies that loop to text-to-image, text and image-to-video, vision-language, plain LLMs, prompt enhancers, and unified autoregressive-plus-diffusion architectures. Under the hood it leans on Ray, FSDP, a Transfer Queue, and LoRA or full-weight sync, with SGLang and vLLM-Omni as rollout engines.

Two of the team's own algorithms anchor the release. Flow-DPPO targets flow matching and diffusion models, swapping the usual probability-ratio clip for an exact divergence-based trust-region mask. The second, DRPO, is token-level LLM RL with an advantage-weighted quadratic regularizer; the team says it holds up in FP8 where some baselines wobble. Those comparisons are self-reported, and there's no independent benchmarking yet.

Supported models span Stable Diffusion 3 and 3.5, Qwen-Image, FLUX.2-Klein, WAN 2.1 and 2.2, HunyuanVideo 1.0 and 1.5, Qwen-VL, Qwen3, HunyuanImage3, and Bagel. The roadmap promises wider algorithm coverage for newer families and more reward backends. The repo lists no formal release version yet.


Bottom Line

UniRL runs one RL loop across 11 model families, from Stable Diffusion 3.5 to Qwen3, under Apache 2.0.

Quick Facts

  • Two team algorithms: Flow-DPPO and DRPO
  • DRPO paper: arXiv 2606.09821, released May 2026
  • Flow-DPPO manuscript dated June 8, 2026
  • 11 supported model families listed
  • Apache 2.0 license; built on Ray, FSDP, SGLang, vLLM-Omni
  • Algorithm comparisons are company-reported
Tags:Tencent Hunyuanreinforcement learningmultimodal AIopen sourcediffusion modelsLLM training
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.