Image Generation

Tencent's HY-WU Generates Custom LoRA Weights Per Image, Beating Most Closed-Source Editors

A new framework synthesizes instance-specific adapter weights on the fly, skipping the shared-adapter compromise entirely.

Oliver Senti
Oliver SentiSenior AI Editor
March 9, 20265 min read
Share:
Abstract visualization of neural network weight matrices being dynamically generated and injected into a larger model architecture, with flowing data streams connecting encoder and generator components

Tencent's Hunyuan team released HY-WU this week, a framework that generates fresh LoRA adapter weights for every single input during inference. No fine-tuning, no test-time optimization. An 8-billion-parameter generator model looks at your image and text instruction, builds a custom set of LoRA matrices on the spot, and injects them into the frozen backbone. The idea is simple enough to state in one sentence, but the engineering required to make it work at scale on an 80B-parameter base model is not.

The gradient conflict problem

Anyone who has tried training a single LoRA adapter across contradictory editing tasks knows the frustration. Tell a model to both blur and deblur, age and de-age faces, and the gradients start fighting each other. The Tencent team measured this directly in their technical report: cosine similarity between gradients from conflicting tasks averaged around negative 0.30. The tasks are literally pulling weights in opposite directions.

Standard approaches deal with this by accepting the compromise. You train one shared adapter, it does okay at everything and great at nothing. Or you train separate adapters per task, which doesn't scale and requires knowing your task categories in advance. Full supervised fine-tuning with many more trainable parameters? According to the report, it produces results comparable to a shared LoRA anyway, because both approaches collapse to a single fixed point in weight space during inference.

HY-WU sidesteps the whole thing. Instead of learning one adapter that tries to serve all masters, a generator network produces a unique adapter for each input.

How it actually works

The generator takes a joint representation of the input image and text instruction through a SigLIP2 encoder, then synthesizes roughly 0.72 billion LoRA parameters that get injected into HunyuanImage-3.0-Instruct, the 80B MoE model Tencent open-sourced in January. The whole thing is trained end-to-end through the downstream editing loss. No pre-collected adapter checkpoints, no two-stage pipeline.

The GitHub repo and model weights are both available under the Tencent Hunyuan Community License. But before you get excited: you'll need 8x40GB or 4x80GB of VRAM to run this. That's the base model plus the generator sitting on top of it.

Does it work, though?

In pairwise human evaluation using the GSB (Good/Same/Bad) protocol across over 1,000 editing cases, HY-WU claims win rates of 67 to 78 percent against open-source editors including Step1X, Qwen, LongCat, and FLUX-based models. Those are Tencent's own numbers from their own evaluation, so the usual caveats apply. But the margins are wide enough to be interesting even after discounting for home-field advantage.

The closed-source comparisons are tighter. HY-WU reportedly edges out ByteDance's Seedream 4.5 at 55.6% and GPT Image 1.5 at 55.5% in the same pairwise protocol. Those are modest margins, the kind that could flip with different prompt sets or evaluator pools. I'd want to see independent replication before reading too much into them.

Google's Nano Banana 2 and Nano Banana Pro remain ahead, with HY-WU scoring 52.4% and 53.8% respectively in their favor. The project page acknowledges this directly, noting the gap is modest given that those commercial systems likely use larger backbones and proprietary training data. A fair point, but "we lost by a little" is still losing.

The ablation that matters

Here's what caught my attention. When the team scrambled the conditioning signal (feeding the generator shuffled or averaged representations instead of the actual input), performance collapsed back to baseline. That's the control experiment you want to see. It means the gains come from the conditional routing itself, not from simply having more parameters in the system. A bigger model alone doesn't explain the results.

Full SFT with substantially more trainable parameters produced results on par with a shared LoRA. Both hit the same wall: a single fixed operating point at inference time, regardless of what you throw at the model. HY-WU avoids that wall by construction.

What's missing

The report calls this "Part I" of a series on functional memory for generative models. They mention plans to compare against retrieval-augmented approaches, develop online learning protocols, and explore scaling the generator independently from the base model. They also want to extend beyond LoRA to other operator interfaces, and push into video and agent systems.

That's a lot of future work. For now, what we have is a proof of concept on image editing. And an important question the paper doesn't address: how much of the 6.7% computational overhead they report depends on their custom CUDA kernels? If you're trying to replicate this in vanilla PyTorch without Tencent's infrastructure, expect worse numbers. Maybe much worse.

The next FTC filing deadline is irrelevant here, so I won't pretend there's a neat regulatory hook. What matters is whether conditional weight generation becomes a standard tool in the adapter toolbox, or stays a research curiosity that only works when you have Tencent-scale compute sitting around. The VRAM requirements alone will keep most researchers from even trying.

Tags:TencentHY-WULoRAimage editingconditional adaptationHunyuangenerative AIcomputer vision
Oliver Senti

Oliver Senti

Senior AI Editor

Former software engineer turned tech writer, Oliver has spent the last five years tracking the AI landscape. He brings a practitioner's eye to the hype cycles and genuine innovations defining the field, helping readers separate signal from noise.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Tencent HY-WU: Per-Instance LoRA for Image Editing | aiHola