OpenAI Launches GPT-Realtime-2 Voice Model in API

Abstract visualization of audio waveforms streaming through a network of nodes, suggesting real-time voice processing.

OpenAI shipped three voice models to its Realtime API on Thursday, all built for live conversation rather than batch transcription. The headline release is GPT-Realtime-2, which the company's announcement calls its first voice model with GPT-5-class reasoning. Context jumps from 32K to 128K tokens. Reasoning effort is adjustable across five tiers from minimal to xhigh, and the model can fire parallel tool calls while saying things like "let me check that" so the line doesn't go dead while it works.

The other two are narrower. GPT-Realtime-Translate handles 70+ input languages into 13 output languages live. GPT-Realtime-Whisper is a streaming speech-to-text model meant for captions and meeting notes as the speaker talks, not post-recorded audio.

Pricing splits by model. Realtime-2 stays on per-token billing at $32 per million input audio tokens and $64 output, with cached input at $0.40. Translate runs $0.034 per minute, Whisper $0.017 per minute, easier numbers to forecast against than the token model.

OpenAI's self-reported benchmarks show Realtime-2 (high) scoring 15.2% above Realtime-1.5 on Big Bench Audio, and the xhigh variant scoring 13.8% higher on Audio MultiChallenge for instruction-following. Zillow, an early tester, reports a 26-point jump in call success rate (95% vs. 69%) on what it calls its hardest adversarial benchmark, after prompt tuning. Both figures come from interested parties.

All three models are live in the API now, documented in the Realtime developer guide. EU data residency is supported.

Bottom Line

GPT-Realtime-2 quadruples the context window to 128K tokens and stays at $32/$64 per million audio input/output tokens.

Quick Facts

Three models: GPT-Realtime-2, GPT-Realtime-Translate, GPT-Realtime-Whisper
Context window: 32K to 128K tokens
GPT-Realtime-2 pricing: $32/1M input audio tokens, $64/1M output, $0.40 cached input
Translate: $0.034/min; Whisper: $0.017/min
Translate supports 70+ input languages, 13 output languages (company-reported)
Zillow reports 95% vs. 69% call success rate on its own benchmark (unverified)

Tags:openaivoice-airealtime-apigpt-realtime-2speech-modelsvoice-agents

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

OpenAI Adds Three Voice Models to Realtime API

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

OpenAI Codex Lands in Chrome on Mac and Windows

OpenAI Makes GPT-5.5 Instant the New ChatGPT Default

Anthropic adds 'Dreaming' to Claude Managed Agents alongside outcomes and subagent orchestration

Stay Ahead of the AI Curve