NVIDIA Open-Sources PersonaPlex-7B Full-Duplex Voice Model

Abstract visualization of two overlapping audio waveforms representing simultaneous listening and speaking in full-duplex voice AI

NVIDIA released PersonaPlex-7B-v1, a full-duplex speech-to-speech model that can listen and generate speech at the same time. No waiting for turns. The model, available on Hugging Face with over 330,000 downloads already, replaces the traditional ASR-to-LLM-to-TTS pipeline with a single Transformer that handles everything in one pass.

The pitch is straightforward: most voice AI still works like a walkie-talkie. You talk, it waits, it thinks, it responds. PersonaPlex processes incoming audio while simultaneously generating its own speech, supporting interruptions, overlapping talk, and backchannels (the "uh-huh" and "right" that make conversations feel human). Built on Kyutai's Moshi architecture with a Helium language backbone, the 7B-parameter model also lets developers customize both voice and persona through audio and text prompts.

NVIDIA's own benchmarks, measured on FullDuplexBench, show smooth turn-taking latency at 0.170 seconds and interruption handling at 0.240 seconds. On dialog naturalness scores, PersonaPlex hit 2.95 MOS compared to 2.80 for Gemini and 2.81 for Qwen-2.5-Omni. Those numbers come from NVIDIA's research paper, so take them with appropriate caution: all benchmarks are company-reported, and the evaluator pool ranged from 152 to 202 people depending on the test category.

Code ships under MIT, weights under the NVIDIA Open Model License, both cleared for commercial use. You'll need serious hardware though: NVIDIA recommends at least 24 GB of VRAM. English only for now, with other languages on the roadmap.

Bottom Line

PersonaPlex-7B is commercially licensed, open-weight, and already has 330,000+ Hugging Face downloads, but requires a 24 GB VRAM GPU and only supports English so far.

Quick Facts

7 billion parameters, built on Moshi architecture
Smooth turn-taking latency: 0.170 seconds (company-reported)
User interruption latency: 0.240 seconds (company-reported)
Dialog naturalness MOS: 2.95 vs. Gemini's 2.80 (company-reported)
330,000+ downloads on Hugging Face
Requires 24 GB+ VRAM (A10G, A40, RTX 3090/4090)

Tags:NVIDIAPersonaPlexfull-duplexvoice AIopen sourcespeech-to-speechconversational AI

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

NVIDIA Open-Sources PersonaPlex-7B Full-Duplex Voice Model

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

NVIDIA Starts Taking a Cut of AI Cloud Revenue

Meituan Open-Sources LongCat-2.0, a 1.6T Coding Model

NVIDIA Releases LocateAnything-3B Visual Grounding Model

Stay Ahead of the AI Curve