OpenBMB Releases VoxCPM2: 2B TTS Model, 30 Languages

Abstract waveform visualization representing multilingual speech synthesis technology

OpenBMB dropped VoxCPM2 this week, a 2-billion-parameter text-to-speech model that skips the usual token-based approach entirely. The model weights are live on Hugging Face under an Apache 2.0 license. Trained on over 2 million hours of multilingual speech data, VoxCPM2 covers 30 languages and outputs 48kHz audio, which is studio-grade quality.

The headline feature: you can describe a voice in plain text (gender, age, tone, emotion) and VoxCPM2 will generate it from scratch. No reference audio needed. It also does zero-shot voice cloning from a short audio clip, with optional style controls for emotion and pacing. The GitHub repo includes streaming synthesis support and fine-tuning scripts that work with as little as 5 to 10 minutes of audio.

Built on the MiniCPM-4 backbone, VoxCPM2 uses a diffusion autoregressive architecture that works directly in continuous space rather than converting speech to discrete tokens first. OpenBMB claims this preserves acoustic detail that tokenizer-based systems lose. Benchmarks are self-reported across Seed-TTS-eval and other standard tests, though independent validation hasn't surfaced yet.

The earlier VoxCPM 1.0 reported a real-time factor of 0.17 on an RTX 4090. VoxCPM2 fine-tuning supports both full SFT and LoRA. A live demo is available on Hugging Face Spaces.

Bottom Line

VoxCPM2 is a 2B open-source TTS model covering 30 languages with voice design and cloning in a single unified architecture, though benchmark claims remain self-reported.

Quick Facts

2 billion parameters
30 languages supported
48kHz audio output
Trained on 2M+ hours of speech data
Apache 2.0 license
Benchmarks are company-reported, not independently verified

Tags:text-to-speechOpenBMBVoxCPM2voice cloningopen-source AIspeech synthesisMiniCPM

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

OpenBMB Releases VoxCPM2, a 2B-Parameter Multilingual TTS Model

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

OpenBMB Open-Sources MiniCPM5-1B Weights, Data, and Code

ElevenLabs Launches Dubbing v2, Skipping Transcripts to Keep Original Voices

Genesis AI Open-Sources Genesis World 1.0, a Robotics Simulator Built for Evaluation First

Stay Ahead of the AI Curve