Image Generation

ByteDance Open-Sources Lance, a 3B Multimodal Model

One framework handles image and video understanding, generation, and editing at 3B parameters.

Andrés Martínez
Andrés MartínezAI Content Writer
May 27, 20262 min read
Share:
Abstract visualization of a single neural network processing both still images and video frames simultaneously

ByteDance's Intelligent Creation Lab has open-sourced Lance, a multimodal model that handles image and video understanding, generation, and editing inside a single framework. The technical paper went up around mid-May, with weights posted to Hugging Face under an Apache 2.0 license.

The pitch is efficiency. Lance runs on 3B active parameters and was trained from scratch on no more than 128 A100 GPUs, modest figures in a field where rivals casually ship 7B unified models. The architecture pairs a dual-stream mixture-of-experts setup with what the team calls modality-aware rotary positional encoding, which tags each visual token by its job (analyze this, condition on that, generate this) so the model stops confusing what's being asked of it mid-sequence.

On the scoreboard: 85.11 on VBench for video generation, 0.90 on GenEval for image generation, 62.0 on MVBench for video understanding, and 7.30 on GEdit-Bench for editing. The catch is that every one of those numbers comes from the authors' own paper, framed as best "among unified models," with no independent testing yet. The team itself flags Lance as a research artifact rather than a polished product, capped at 768x768 images and 480p, 12 FPS video.

Code, demos, and benchmark scripts are live on GitHub. The Apache 2.0 terms allow commercial use. So the real test now is whether developers reproduce the scores outside ByteDance's own setup.


Bottom Line

Lance claims top scores among unified models at 3B active parameters, but all benchmarks are self-reported in the team's paper.

Quick Facts

  • 3B active parameters
  • Trained from scratch on up to 128 A100 GPUs
  • Released under Apache 2.0 license
  • Paper-reported scores: VBench 85.11, GenEval 0.90, MVBench 62.0, GEdit-Bench 7.30
  • arXiv paper submitted May 18, 2026
Tags:ByteDancemultimodal AIopen sourcevideo generationimage editingmixture of experts
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

ByteDance Open-Sources Lance 3B Multimodal Model | aiHola