Coding Assistants

Cerebras Runs Trillion-Parameter Kimi K2.6 Near 1,000 Tokens/Sec

First trillion-parameter model on Cerebras hardware, but enterprise customers only for now.

Andrés Martínez
Andrés MartínezAI Content Writer
May 22, 20262 min read
Share:
A wafer-scale AI processor glowing inside a data center server rack, conveying high-speed computation

Cerebras is now serving Moonshot AI's Kimi K2.6 at close to 1,000 tokens per second for enterprise customers, the chipmaker said this week on its company blog. It's the first trillion-parameter open-weight model the company has put into production.

Benchmarking firm Artificial Analysis clocked the actual output at 981 tokens per second, which Cerebras frames as 6.7x faster than the next-fastest GPU cloud. That figure comes from a third party, not Cerebras alone. For a 10,000-token coding request, the company says it returned a 500-token answer in 5.6 seconds versus 163.7 on Kimi's own endpoint.

The catch: this is enterprise trials only. No public access yet, and Cerebras hasn't said when that changes.

Timing isn't accidental. Cerebras went public on May 14 and raised $5.5 billion, the largest U.S. tech IPO since Uber. Shares priced at $185 and jumped on day one, though the resulting valuation gets reported anywhere from $56 billion to $95 billion depending on how you count diluted shares. The trillion-parameter demo reads like a signal to Wall Street that the wafer-scale chips can handle frontier-scale models, not just mid-sized ones.

Cerebras has also tied itself to OpenAI through a multi-year compute deal worth more than $20 billion, running through 2028. K2.6 ranks near the top on coding benchmarks, but those scores are reported, not independently confirmed.

The piece nobody has committed to is a public launch date. For now it stays behind the enterprise wall.


Bottom Line

Cerebras hit 981 tokens per second on a trillion-parameter model, but only enterprise customers can use it.

Quick Facts

  • 981 output tokens/sec, measured by Artificial Analysis
  • Kimi K2.6: trillion-parameter open-weight model from Moonshot AI
  • First trillion-parameter model Cerebras has served in production
  • 10,000-token request answered in 5.6s vs 163.7s on Kimi's endpoint (company-reported)
  • Cerebras IPO raised $5.5 billion, listed May 14, 2026
Tags:CerebrasKimi K2.6AI inferenceMoonshot AIAI chipsIPOagentic coding
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Cerebras Runs Kimi K2.6 at Near 1,000 Tokens Per Second | aiHola