Open-Source AI

OpenBMB Releases VoxCPM2, a 2B-Parameter Multilingual TTS Model

Open-source model supports 30 languages, voice cloning, and voice design from text descriptions.

Andrés Martínez
Andrés MartínezAI Content Writer
April 7, 20262 min read
Share:
Abstract waveform visualization representing multilingual speech synthesis technology

OpenBMB dropped VoxCPM2 this week, a 2-billion-parameter text-to-speech model that skips the usual token-based approach entirely. The model weights are live on Hugging Face under an Apache 2.0 license. Trained on over 2 million hours of multilingual speech data, VoxCPM2 covers 30 languages and outputs 48kHz audio, which is studio-grade quality.

The headline feature: you can describe a voice in plain text (gender, age, tone, emotion) and VoxCPM2 will generate it from scratch. No reference audio needed. It also does zero-shot voice cloning from a short audio clip, with optional style controls for emotion and pacing. The GitHub repo includes streaming synthesis support and fine-tuning scripts that work with as little as 5 to 10 minutes of audio.

Built on the MiniCPM-4 backbone, VoxCPM2 uses a diffusion autoregressive architecture that works directly in continuous space rather than converting speech to discrete tokens first. OpenBMB claims this preserves acoustic detail that tokenizer-based systems lose. Benchmarks are self-reported across Seed-TTS-eval and other standard tests, though independent validation hasn't surfaced yet.

The earlier VoxCPM 1.0 reported a real-time factor of 0.17 on an RTX 4090. VoxCPM2 fine-tuning supports both full SFT and LoRA. A live demo is available on Hugging Face Spaces.


Bottom Line

VoxCPM2 is a 2B open-source TTS model covering 30 languages with voice design and cloning in a single unified architecture, though benchmark claims remain self-reported.

Quick Facts

  • 2 billion parameters
  • 30 languages supported
  • 48kHz audio output
  • Trained on 2M+ hours of speech data
  • Apache 2.0 license
  • Benchmarks are company-reported, not independently verified
Tags:text-to-speechOpenBMBVoxCPM2voice cloningopen-source AIspeech synthesisMiniCPM
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

OpenBMB Releases VoxCPM2: 2B TTS Model, 30 Languages | aiHola