Cerebras is now serving Moonshot AI's Kimi K2.6 at close to 1,000 tokens per second for enterprise customers, the chipmaker said this week on its company blog. It's the first trillion-parameter open-weight model the company has put into production.
Benchmarking firm Artificial Analysis clocked the actual output at 981 tokens per second, which Cerebras frames as 6.7x faster than the next-fastest GPU cloud. That figure comes from a third party, not Cerebras alone. For a 10,000-token coding request, the company says it returned a 500-token answer in 5.6 seconds versus 163.7 on Kimi's own endpoint.
The catch: this is enterprise trials only. No public access yet, and Cerebras hasn't said when that changes.
Timing isn't accidental. Cerebras went public on May 14 and raised $5.5 billion, the largest U.S. tech IPO since Uber. Shares priced at $185 and jumped on day one, though the resulting valuation gets reported anywhere from $56 billion to $95 billion depending on how you count diluted shares. The trillion-parameter demo reads like a signal to Wall Street that the wafer-scale chips can handle frontier-scale models, not just mid-sized ones.
Cerebras has also tied itself to OpenAI through a multi-year compute deal worth more than $20 billion, running through 2028. K2.6 ranks near the top on coding benchmarks, but those scores are reported, not independently confirmed.
The piece nobody has committed to is a public launch date. For now it stays behind the enterprise wall.
Bottom Line
Cerebras hit 981 tokens per second on a trillion-parameter model, but only enterprise customers can use it.
Quick Facts
- 981 output tokens/sec, measured by Artificial Analysis
- Kimi K2.6: trillion-parameter open-weight model from Moonshot AI
- First trillion-parameter model Cerebras has served in production
- 10,000-token request answered in 5.6s vs 163.7s on Kimi's endpoint (company-reported)
- Cerebras IPO raised $5.5 billion, listed May 14, 2026




