NVIDIA Releases PersonaPlex: Full-Duplex Conversational AI Model with Customizable Voice and Role

On January 15, 2026, NVIDIA announced PersonaPlex, a 7-billion-parameter full-duplex speech-to-speech conversational AI model that supports simultaneous listening and speaking.

The model builds on the Moshi architecture from Kyutai. It uses a single integrated system to process user audio input and generate output speech in real time, avoiding the delays of traditional cascaded pipelines (ASR → LLM → TTS). This enables low-latency interactions, reported at approximately 170ms in associated API descriptions, along with handling of interruptions, overlaps, pauses, and backchannels such as "uh-huh" or "yeah."

PersonaPlex accepts two inputs to define its behavior: a text prompt describing the role, background, and context, and an audio embedding (voice prompt) that captures vocal characteristics, speaking style, and prosody. It includes 16 pre-built voices with various accents, genders, and styles. The model maintains consistency in the selected persona across interactions.

Demonstrated examples include:
- A wise and friendly teacher responding to questions with general knowledge, interruptibility, and natural turn-taking.

The training dataset combines 7,303 real conversations (1,217 hours) from the Fisher English corpus, back-annotated with prompts generated by GPT-OSS-120B, and synthetic data: 39,322 assistant-role conversations (410 ho...

You've read this far — sign in to keep reading

Sign in to keep reading.