ByteDance Open-Sources Lance 3B Multimodal Model

Abstract visualization of a single neural network processing both still images and video frames simultaneously

ByteDance's Intelligent Creation Lab has open-sourced Lance, a multimodal model that handles image and video understanding, generation, and editing inside a single framework. The technical paper went up around mid-May, with weights posted to Hugging Face under an Apache 2.0 license.

The pitch is efficiency. Lance runs on 3B active parameters and was trained from scratch on no more than 128 A100 GPUs, modest figures in a field where rivals casually ship 7B unified models. The architecture pairs a dual-stream mixture-of-experts setup with what the team calls modality-aware rotary positional encoding, which tags each visual token by its job (analyze this, condition on that, generate this) so the model stops confusing what's being asked of it mid-sequence.

On the scoreboard: 85.11 on VBench for video generation, 0.90 on GenEval for image generation, 62.0 on MVBench for video understanding, and 7.30 on GEdit-Bench for editing. The catch is that every one of those numbers comes from the authors' own paper, framed as best "among unified models," with no independent testing yet. The team itself flags Lance as a research artifact rather than a polished product, capped at 768x768 images and 480p, 12 FPS video.

Code, demos, and benchmark scripts are live on GitHub. The Apache 2.0 terms allow commercial use. So the real test now is whether developers reproduce the scores outside ByteDance's own setup.

Bottom Line

Lance claims top scores among unified models at 3B active parameters, but all benchmarks are self-reported in the team's paper.

Quick Facts

3B active parameters
Trained from scratch on up to 128 A100 GPUs
Released under Apache 2.0 license
Paper-reported scores: VBench 85.11, GenEval 0.90, MVBench 62.0, GEdit-Bench 7.30
arXiv paper submitted May 18, 2026

Tags:ByteDancemultimodal AIopen sourcevideo generationimage editingmixture of experts

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

ByteDance Open-Sources Lance, a 3B Multimodal Model

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

Microsoft Launches MAI-Image-2.5, Ranks Third on Arena

PrismML Shrinks a 4B Image Model to Run on iPhones

StepFun Releases Open-Weight Step 3.7 Flash for Agentic Work

Stay Ahead of the AI Curve