HappyHorse-1.0: Alibaba's Stealth AI Video Model Tops Arena

On April 7, a model nobody had heard of appeared on the Artificial Analysis Video Arena and immediately topped both the text-to-video and image-to-video leaderboards. No press release. No company name attached. Just "HappyHorse-1.0" and a freshly registered domain.

Within 48 hours, The Information reported that Alibaba was behind it, citing two people with knowledge of the company's plans. The official site now confirms it: HappyHorse was built by the Future Life Lab team inside Alibaba's Taotian Group, led by Zhang Di, formerly VP of Kuaishou and the technical architect behind Kling AI.

The stealth approach worked. By the time anyone figured out who made it, thousands of blind votes had already piled up.

The numbers

In the no-audio text-to-video category, HappyHorse hit an Elo of 1333 to 1357, beating ByteDance's Seedance 2.0 by roughly 60 points. For image-to-video without audio, the gap was smaller but the absolute score was higher: 1391 to 1406 Elo, a new arena record. In the audio-inclusive tracks, Seedance 2.0 still holds the top spot, with HappyHorse trailing by about 14 Elo points in text-to-video.

Those Elo gaps matter more than they might look. A 40-point difference in this system means users can reliably tell the outputs apart in blind comparisons. A 60-point lead is substantial.

But here's the thing: these scores come from around 5,000 to 6,300 votes per category. That's decent sample size for a preference arena, though not enormous. And the model appeared less than a week ago. Elo ratings tend to stabilize over time, and early entrants sometimes benefit from novelty effects in blind tests, where voters gravitate toward the output that looks "different" from what they have seen before.

It's basically daVinci-MagiHuman with more training

The most interesting thread isn't who built it but where it came from. Community investigators quickly noticed that HappyHorse's architecture description is nearly identical to daVinci-MagiHuman, an open-source model released in March by Sand.ai and the GAIR Lab at Shanghai Jiao Tong University.

Same 15 billion parameters. Same 40-layer single-stream transformer with the sandwich layout (modality-specific layers at positions 1 through 4 and 37 through 40, shared weights in the middle). Same MagiCompiler for inference acceleration. Same 8-step DMD-2 distillation. Same claim of 38 seconds for a 1080p clip on H100. Even the WER numbers for lip-sync quality line up: daVinci-MagiHuman reported 14.60% on its Hugging Face page, and HappyHorse's self-reported benchmarks match.

A DEV Community post spells out the connection more directly: Sand.ai founder Cao Yue and Zhang Di's team jointly iterated on the model, with this round of optimization focused on user-preference scenarios, character expressions, and visual aesthetics. So HappyHorse appears to be daVinci-MagiHuman after a focused tuning pass aimed squarely at winning arena votes.

I genuinely don't know whether that makes the arena result more or less impressive. On one hand, the underlying architecture was already public. On the other, the preference-tuning clearly worked.

What the architecture actually does

The single-stream design is the part worth paying attention to. Most video generation models, including the closed-source ones from Google (Veo), OpenAI (Sora), and ByteDance, use cross-attention to condition video generation on text. HappyHorse, like daVinci-MagiHuman before it, skips that entirely. Text tokens, image latents, video latents, and audio tokens get concatenated into one sequence and run through standard self-attention. No separate encoders per modality. No fusion blocks.

The practical upside is speed and simplicity. The practical risk is that cramming everything into one attention window makes the model harder to scale to longer videos or higher resolutions without running into memory walls. The current output is five seconds at 1080p. That's fine for demos and short-form content. Whether the architecture extends to, say, 30-second clips at the same quality is an open question the team hasn't addressed.

The "open source" asterisk

The official site says weights, distilled models, super-resolution modules, and inference code will all be released. As of April 10, both the GitHub and Hugging Face links still say "coming soon." The press releases distributed through ABNewswire claim everything is already publicly available on GitHub, which is not accurate.

This matters because the open-source framing is doing a lot of heavy lifting in the narrative around HappyHorse. If the weights never materialize, or if they arrive months later with restrictive licensing, the story changes considerably. daVinci-MagiHuman is genuinely open under Apache 2.0, so there's precedent from the same collaborators. But precedent is not a promise.

Why the stealth drop?

Anonymous submissions to AI leaderboards have become a pattern in the Chinese AI ecosystem. Earlier this year, a mystery model called Pony Alpha appeared on OpenRouter and turned out to be Z.ai's GLM-5 doing a stress test. The playbook is: rank first under a pseudonym, generate buzz, then reveal your identity once the numbers speak for themselves.

Zhang Di's team took this a step further by registering the domain the same day the model appeared on the arena. The 2026 Chinese zodiac is the Horse, which explains the name (and the domain timing suggests this was planned, not spontaneous).

For Alibaba, the move makes strategic sense. Their existing open-source video model, Wan, sits well below HappyHorse on the leaderboard. Launching under a new brand, with a team led by a hire from a competitor, lets them test the waters without cannibalizing or confusing the Wan product line.

What's missing

No technical report yet. No independent benchmarks beyond the arena votes and the self-reported numbers on the website. No ablation studies showing what the preference tuning actually changed relative to daVinci-MagiHuman. No information on training data or compute costs.

The arena rankings are real, third-party, and blind. That counts for something. But a single leaderboard, even a well-run one, is not a comprehensive evaluation. We don't know how HappyHorse handles edge cases, long prompts, complex multi-character scenes, or anything outside the arena's prompt distribution.

The team says a full technical report will accompany the open-source release. Until then, what we have is a strong arena showing, a confirmed Alibaba connection, and a lot of press releases that got ahead of the actual availability.