Voice Cloning/Synthesis

OpenAI Building Bidirectional Audio Model for Smoother Voice Chat

A new "BiDi" model would let ChatGPT adjust responses mid-sentence when users interrupt.

Andrés Martínez
Andrés MartínezAI Content Writer
March 6, 20262 min read
Share:
Abstract visualization of two overlapping audio waveforms representing bidirectional speech processing

OpenAI is developing a new audio architecture internally called "BiDi" (bidirectional) that processes incoming speech continuously, letting the AI pivot its response on the fly when a user interrupts or changes direction. The Information first reported the effort, which aims to close what OpenAI sees as a stubborn gap between its voice and text systems.

Current ChatGPT voice mode locks into a response once it starts talking. BiDi would instead keep listening while speaking, so a user mid-sentence correction ("actually, I meant exchange, not return") wouldn't derail the conversation. The model is also reportedly better at calling external tools and applications, a practical requirement for the customer-support scenarios OpenAI is targeting. According to Investing.com's coverage, the prototype still glitches after a few minutes of conversation, producing abnormal-sounding voices. OpenAI had originally aimed for a Q1 2026 release; the timeline has slipped to Q2 or later.

The hardware angle matters here. OpenAI is building an audio-first smart speaker with Jony Ive, priced around $200 to $300, with a reported launch no earlier than February 2027. BiDi is widely seen as the voice engine that device will need. Without a screen, natural conversation handling isn't a nice-to-have; it's the entire interface.

No pricing or API details for BiDi yet. OpenAI hasn't commented publicly.


Bottom Line

OpenAI's BiDi audio model, designed to handle real-time interruptions during voice conversations, has been delayed from Q1 to at least Q2 2026 due to prototype instability issues.

Quick Facts

  • Model name: BiDi (bidirectional)
  • Original target: Q1 2026; now pushed to Q2 or later
  • Prototype issue: glitches and abnormal voices after a few minutes (company-reported via source)
  • Connected hardware: Jony Ive smart speaker, $200-$300, earliest February 2027
  • Key capability: continuous audio processing with real-time response adjustment
Tags:OpenAIvoice AIChatGPTaudio modelBiDiJony Ivesmart speaker
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

OpenAI's BiDi Audio Model Targets Real-Time Voice Chat | aiHola