Thinking Machines Previews Real-Time Interaction Models

Abstract illustration of simultaneous audio, video, and text streams flowing between a human and an AI presence in real time

Thinking Machines Lab, the startup founded by former OpenAI CTO Mira Murati, announced a research preview of its "interaction models" on Monday. The company blog frames them as multimodal AI that listens, watches, and talks at the same time, instead of waiting for users to finish a thought before responding.

The flagship, TML-Interaction-Small, is a 276 billion parameter mixture-of-experts model with 12 billion active. On FD-bench V1, an external interactivity benchmark, Thinking Machines reports a turn-taking latency of 0.40 seconds, versus 0.57 for Google's Gemini-3.1-flash-live and 1.18 for GPT-realtime-2.0 in their minimal settings. The numbers are self-reported.

Architecturally, the model drops the standard alternating-token sequence for a multi-stream design that processes inputs and emits outputs in 200ms increments. A separate background model handles longer reasoning and tool calls, then weaves results back in mid-conversation. "We've designed an AI that works with people the same way" people collaborate, the company posted on X. That's the pitch; some of the more striking qualitative claims rest on internal benchmarks the company built itself, TimeSpeak and CueSpeak, where rival systems scored near zero.

Access is gated. Pricing wasn't disclosed. A limited research preview opens in the coming months, with a wider release planned for later in 2026.

Bottom Line

TML-Interaction-Small is a 276B-parameter MoE that reports a 0.40-second turn-taking latency on FD-bench, behind only its own benchmarks.

Quick Facts

Model: TML-Interaction-Small, 276B parameter MoE with 12B active
FD-bench V1 turn-taking latency: 0.40 seconds, company-reported
Gemini-3.1-flash-live (minimal): 0.57s; GPT-realtime-2.0 (minimal): 1.18s
Processing increments: 200 milliseconds per micro-turn
Announced May 11, 2026; wider release planned later in 2026

Tags:thinking machinesmira muratiinteraction modelsmultimodal AIreal-time AIvoice AI

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Thinking Machines Previews Real-Time Interaction Models

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

OpenAI Adds Three Voice Models to Realtime API

Zyphra Releases 74B MoE Checkpoint Trained Entirely on AMD

Zyphra Launches ZAYA1-8B MoE Model Trained on AMD

Stay Ahead of the AI Curve