Thinking Machines Lab, the startup founded by former OpenAI CTO Mira Murati, announced a research preview of its "interaction models" on Monday. The company blog frames them as multimodal AI that listens, watches, and talks at the same time, instead of waiting for users to finish a thought before responding.
The flagship, TML-Interaction-Small, is a 276 billion parameter mixture-of-experts model with 12 billion active. On FD-bench V1, an external interactivity benchmark, Thinking Machines reports a turn-taking latency of 0.40 seconds, versus 0.57 for Google's Gemini-3.1-flash-live and 1.18 for GPT-realtime-2.0 in their minimal settings. The numbers are self-reported.
Architecturally, the model drops the standard alternating-token sequence for a multi-stream design that processes inputs and emits outputs in 200ms increments. A separate background model handles longer reasoning and tool calls, then weaves results back in mid-conversation. "We've designed an AI that works with people the same way" people collaborate, the company posted on X. That's the pitch; some of the more striking qualitative claims rest on internal benchmarks the company built itself, TimeSpeak and CueSpeak, where rival systems scored near zero.
Access is gated. Pricing wasn't disclosed. A limited research preview opens in the coming months, with a wider release planned for later in 2026.
Bottom Line
TML-Interaction-Small is a 276B-parameter MoE that reports a 0.40-second turn-taking latency on FD-bench, behind only its own benchmarks.
Quick Facts
- Model: TML-Interaction-Small, 276B parameter MoE with 12B active
- FD-bench V1 turn-taking latency: 0.40 seconds, company-reported
- Gemini-3.1-flash-live (minimal): 0.57s; GPT-realtime-2.0 (minimal): 1.18s
- Processing increments: 200 milliseconds per micro-turn
- Announced May 11, 2026; wider release planned later in 2026




