Music/Audio Generation

ElevenLabs Launches Dubbing v2, Skipping Transcripts to Keep Original Voices

The model conditions on the speaker's actual performance instead of a transcript, covering 90+ languages.

Oliver Senti
Oliver SentiSenior AI Editor
May 31, 20263 min read
Share:
Audio waveform splitting into multiple colored streams representing different languages

ElevenLabs released Dubbing v2 on May 28, 2026, an AI dubbing model that translates speech across more than 90 languages while trying to keep the original speaker's emotion, pacing, and delivery intact. The company announced it on its research blog, and the tool is live now inside two products. API access is not.

What's actually different here

The pitch hinges on one architectural choice. Most AI dubbing leans on a transcript: convert speech to text, translate the text, generate new audio. Accurate, sure, but the result tends to sound like a stranger reading your words. Dubbing v2 instead conditions directly on the source performance, so the model is listening to how something was said, not just what.

That is the speech-to-speech approach a lot of labs have been circling. ElevenLabs frames it as solving the flat, disconnected audio problem, and the early reaction from creators on social has been loud. One described it as performance that finally survives translation, which is a nice line, though nobody has published independent benchmarks against human dubbing studios yet. Coverage from AI Weekly flagged exactly that gap, plus the unanswered question of what voice data trained the thing across all 90-plus languages.

The language count, with an asterisk

The headline number is 90+ languages. The source material going around claims this jumps from 29, but that 29 figure traces back to ElevenLabs' older Multilingual v2 speech model, not its previous dubbing product, so treat the leap as marketing math rather than a clean generational upgrade.

The sync handling is the less flashy part that probably matters more day to day. Dubbing v2 adapts phrasing for each language rather than translating word for word, then aligns the starts, stops, and pauses to the original timing automatically. Anyone who has manually nudged subtitle timings knows why that is worth something.

Who it's for, and what it costs

Two front doors. ElevenCreative targets creators and marketers, with one-click YouTube localization and a new Creator Dubbing Partner Program offering discounted access to eligible applicants. ElevenProductions is the studio and broadcaster lane, pairing the model with human translators, voice casting, and audio mixing.

ElevenLabs says professional dubbing can run hundreds of dollars per minute, which is the number it wants you comparing against. For the seven days following launch, the free plan includes one minute of dubbing, the Starter plan 15 minutes, and Creator+ plans 30.

The public API is the obvious missing piece. It is listed as coming soon, with no date, and ElevenLabs is routing interested developers to its sales team in the meantime. Until that ships, programmatic access stays gated.

Tags:ElevenLabsAI dubbingspeech-to-speechvoice cloningAI translationlocalizationDubbing v2generative audio
Oliver Senti

Oliver Senti

Senior AI Editor

Former software engineer turned tech writer, Oliver has spent the last five years tracking the AI landscape. He brings a practitioner's eye to the hype cycles and genuine innovations defining the field, helping readers separate signal from noise.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

ElevenLabs Dubbing v2 Launches With 90+ Languages | aiHola