AI Models Platforms

Thinking Machines Previews Real-Time Interaction Models

Murati's lab unveils multimodal AI that listens and talks at once, claiming 0.40s latency.

Andrés Martínez
Andrés MartínezAI Content Writer
May 12, 20262 min read
Share:
Abstract illustration of simultaneous audio, video, and text streams flowing between a human and an AI presence in real time

Thinking Machines Lab, the startup founded by former OpenAI CTO Mira Murati, announced a research preview of its "interaction models" on Monday. The company blog frames them as multimodal AI that listens, watches, and talks at the same time, instead of waiting for users to finish a thought before responding.

The flagship, TML-Interaction-Small, is a 276 billion parameter mixture-of-experts model with 12 billion active. On FD-bench V1, an external interactivity benchmark, Thinking Machines reports a turn-taking latency of 0.40 seconds, versus 0.57 for Google's Gemini-3.1-flash-live and 1.18 for GPT-realtime-2.0 in their minimal settings. The numbers are self-reported.

Architecturally, the model drops the standard alternating-token sequence for a multi-stream design that processes inputs and emits outputs in 200ms increments. A separate background model handles longer reasoning and tool calls, then weaves results back in mid-conversation. "We've designed an AI that works with people the same way" people collaborate, the company posted on X. That's the pitch; some of the more striking qualitative claims rest on internal benchmarks the company built itself, TimeSpeak and CueSpeak, where rival systems scored near zero.

Access is gated. Pricing wasn't disclosed. A limited research preview opens in the coming months, with a wider release planned for later in 2026.


Bottom Line

TML-Interaction-Small is a 276B-parameter MoE that reports a 0.40-second turn-taking latency on FD-bench, behind only its own benchmarks.

Quick Facts

  • Model: TML-Interaction-Small, 276B parameter MoE with 12B active
  • FD-bench V1 turn-taking latency: 0.40 seconds, company-reported
  • Gemini-3.1-flash-live (minimal): 0.57s; GPT-realtime-2.0 (minimal): 1.18s
  • Processing increments: 200 milliseconds per micro-turn
  • Announced May 11, 2026; wider release planned later in 2026
Tags:thinking machinesmira muratiinteraction modelsmultimodal AIreal-time AIvoice AI
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Thinking Machines Previews Real-Time Interaction Models | aiHola