OpenBMB dropped VoxCPM 1.5 on December 5, and the headline number is the sampling rate: 44.1kHz, up from 16kHz in the original release. That's CD-quality audio from a model you can run locally.
The efficiency gains matter more than they look. VoxCPM 1.5 encodes one second of audio in 6.25 tokens instead of 12.5. Halving the token rate doesn't just speed things up; it opens the door to longer audio generation without blowing up memory. The team notes RTF (real-time factor) on an RTX 4090 stays around 0.17, unchanged from before despite the quality bump. The model's trained on 1.8 million hours of Chinese and English audio, which shows in how well it handles context-aware prosody.
New fine-tuning scripts for LoRA and full parameter training ship with this release. OpenBMB is clearly betting that developers want to customize voice cloning for specific use cases rather than treating TTS as a black box. The Apache 2.0 license stays intact.
VoxCPM-0.5B remains supported for anyone not ready to upgrade. Output quality still depends heavily on reference audio quality, per the release notes, so don't expect miracles from noisy clips.
The Bottom Line: A meaningful upgrade for open-source TTS, with the 44.1kHz sampling rate and halved token requirements addressing the two biggest complaints about VoxCPM's first release.
QUICK FACTS
- Sampling rate: 44.1kHz (up from 16kHz)
- Token rate: 6.25 tokens per second of audio (previously 12.5)
- Training data: 1.8 million hours (bilingual Chinese/English)
- RTF: 0.17 on RTX 4090 (company-reported)
- License: Apache 2.0




