NVIDIA released PersonaPlex-7B-v1, a full-duplex speech-to-speech model that can listen and generate speech at the same time. No waiting for turns. The model, available on Hugging Face with over 330,000 downloads already, replaces the traditional ASR-to-LLM-to-TTS pipeline with a single Transformer that handles everything in one pass.
The pitch is straightforward: most voice AI still works like a walkie-talkie. You talk, it waits, it thinks, it responds. PersonaPlex processes incoming audio while simultaneously generating its own speech, supporting interruptions, overlapping talk, and backchannels (the "uh-huh" and "right" that make conversations feel human). Built on Kyutai's Moshi architecture with a Helium language backbone, the 7B-parameter model also lets developers customize both voice and persona through audio and text prompts.
NVIDIA's own benchmarks, measured on FullDuplexBench, show smooth turn-taking latency at 0.170 seconds and interruption handling at 0.240 seconds. On dialog naturalness scores, PersonaPlex hit 2.95 MOS compared to 2.80 for Gemini and 2.81 for Qwen-2.5-Omni. Those numbers come from NVIDIA's research paper, so take them with appropriate caution: all benchmarks are company-reported, and the evaluator pool ranged from 152 to 202 people depending on the test category.
Code ships under MIT, weights under the NVIDIA Open Model License, both cleared for commercial use. You'll need serious hardware though: NVIDIA recommends at least 24 GB of VRAM. English only for now, with other languages on the roadmap.
Bottom Line
PersonaPlex-7B is commercially licensed, open-weight, and already has 330,000+ Hugging Face downloads, but requires a 24 GB VRAM GPU and only supports English so far.
Quick Facts
- 7 billion parameters, built on Moshi architecture
- Smooth turn-taking latency: 0.170 seconds (company-reported)
- User interruption latency: 0.240 seconds (company-reported)
- Dialog naturalness MOS: 2.95 vs. Gemini's 2.80 (company-reported)
- 330,000+ downloads on Hugging Face
- Requires 24 GB+ VRAM (A10G, A40, RTX 3090/4090)




