Inworld AI launched TTS-1.5 on January 21, two new text-to-speech models that the company says deliver the fastest realtime voice synthesis currently available. The announcement puts latency figures at under 130ms for the Mini model and under 250ms for Max, which Inworld claims represents a 4x improvement over its previous generation.
The pricing is aggressive: $0.005 per minute for Mini, $0.01 for Max. That works out to $5-10 per million characters, which Inworld says is 25x cheaper than competitors. The company hasn't named which competitors, though ElevenLabs and OpenAI are the obvious targets. Independent verification of that cost comparison isn't available.
Quality metrics come from Inworld's own testing: 40% lower word error rate and 30% more expressiveness than TTS-1. The models hold the top positions on the Artificial Analysis TTS leaderboard, though that ranking appears to reflect the earlier TTS-1 models rather than 1.5 specifically. Layercode CEO Damien Tanner called the results "unmatched voice realism at a fraction of the cost," though his company is an integration partner.
TTS-1.5 supports 15 languages, with on-premise deployment for enterprise customers. Inworld has also open-sourced its training framework.
The Bottom Line: Inworld is betting that latency and price will win the TTS market; whether the quality claims hold under independent testing remains to be seen.
QUICK FACTS
- Mini latency: <130ms (P90), Max latency: <250ms (P90)
- Pricing: $0.005/min (Mini), $0.01/min (Max)
- 15 languages supported
- 40% word error rate improvement (company-reported)
- Available via API, with on-prem deployment option




