Qwen3-TTS Open-Sources Full Model Lineup with Voice Cloning

Abstract visualization of sound waves transforming into voice silhouettes representing AI speech synthesis

The Qwen team just dropped the full Qwen3-TTS model lineup on GitHub and Hugging Face. Five models in total, all Apache 2.0 licensed: Base, CustomVoice, and VoiceDesign variants in both 0.6B and 1.7B parameter classes. The release includes their 12Hz tokenizer, which compresses audio at roughly half the framerate of typical speech tokenizers while claiming better reconstruction quality.

Voice cloning works from 3 seconds of reference audio across 10 languages. The VoiceDesign model takes a different approach: describe the voice you want in natural language and it generates it. Qwen claims their 1.7B-VoiceDesign model beats GPT-4o-mini-tts and Mimo-Audio-7B-Instruct on InstructTTS-Eval benchmarks, though these are self-reported figures. On the MiniMax TTS multilingual test set, their Base model reportedly achieves lower word error rates than ElevenLabs and MiniMax across most tested languages.

The architecture uses a discrete multi-codebook language model rather than the LM+DiT approach common in recent TTS systems. They claim this avoids information bottlenecks and cascading errors. Streaming generation is supported with first-packet latency under 300ms for real-time applications. Fine-tuning documentation is included in the repo.

The Bottom Line: This gives developers a full open-source TTS stack with voice cloning and natural language voice control, removing dependency on closed APIs for these capabilities.

QUICK FACTS

5 models released: 0.6B and 1.7B variants of Base, CustomVoice, plus 1.7B VoiceDesign
10 languages supported: Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian
3-second voice cloning via Base models
12Hz tokenizer (vs typical 25Hz or 50Hz)
Apache 2.0 license
vLLM day-0 support included

Tags:Qwen3-TTSopen sourcevoice cloningTTSAlibabaspeech synthesisApache 2.0

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Qwen3-TTS Open-Sources Full Model Lineup with Voice Cloning

QUICK FACTS

Andrés Martínez

Related Articles

Meituan Open-Sources LongCat-2.0, a 1.6T Coding Model

Linux Foundation Launches Akrites to Coordinate Open Source Patching

Mistral Releases Leanstral 1.5, an Apache-2.0 Lean 4 Proof Model

Stay Ahead of the AI Curve