OpenAI is developing a new audio architecture internally called "BiDi" (bidirectional) that processes incoming speech continuously, letting the AI pivot its response on the fly when a user interrupts or changes direction. The Information first reported the effort, which aims to close what OpenAI sees as a stubborn gap between its voice and text systems.
Current ChatGPT voice mode locks into a response once it starts talking. BiDi would instead keep listening while speaking, so a user mid-sentence correction ("actually, I meant exchange, not return") wouldn't derail the conversation. The model is also reportedly better at calling external tools and applications, a practical requirement for the customer-support scenarios OpenAI is targeting. According to Investing.com's coverage, the prototype still glitches after a few minutes of conversation, producing abnormal-sounding voices. OpenAI had originally aimed for a Q1 2026 release; the timeline has slipped to Q2 or later.
The hardware angle matters here. OpenAI is building an audio-first smart speaker with Jony Ive, priced around $200 to $300, with a reported launch no earlier than February 2027. BiDi is widely seen as the voice engine that device will need. Without a screen, natural conversation handling isn't a nice-to-have; it's the entire interface.
No pricing or API details for BiDi yet. OpenAI hasn't commented publicly.
Bottom Line
OpenAI's BiDi audio model, designed to handle real-time interruptions during voice conversations, has been delayed from Q1 to at least Q2 2026 due to prototype instability issues.
Quick Facts
- Model name: BiDi (bidirectional)
- Original target: Q1 2026; now pushed to Q2 or later
- Prototype issue: glitches and abnormal voices after a few minutes (company-reported via source)
- Connected hardware: Jony Ive smart speaker, $200-$300, earliest February 2027
- Key capability: continuous audio processing with real-time response adjustment




