Zyphra released ZAYA1-8B this week, a mixture-of-experts model with 8.4 billion total parameters but only 760 million active per token. The company claims it matches much larger open-weight rivals on math and coding benchmarks. The differentiator: trained entirely on AMD hardware.
Per Zyphra's research post, ZAYA1-8B was pretrained, midtrained, and supervised fine-tuned on a 1,024-node AMD Instinct MI300x cluster with AMD Pensando Pollara interconnect, built on IBM Cloud. No NVIDIA in the stack. For a serious reasoning model in 2026, still rare.
Zyphra reports 89.6 on HMMT'25 versus 88.3 for Claude 4.5 Sonnet and GPT-5-High. Self-reported, and dependent on a new test-time compute method the company calls Markovian RSA, which spawns parallel reasoning traces and recursively aggregates tail segments to keep context bounded. Under extra-high compute (5.5M tokens per problem), Zyphra says ZAYA1-8B also tops DeepSeek-V3.2 and GPT-OSS-120B High on APEX-shortlist.
Architecture-wise, ZAYA1-8B layers in Compressed Convolutional Attention, an MLP-based expert router with PID-controller bias balancing, and learned residual scaling. CCA cuts KV-cache memory by 8x versus standard attention, per the company.
CEO Krithik Puthalath called the result a demonstration of "maximizing the intelligence extracted per parameter and per FLOP," which is also the standard line for any efficiency-focused model release. Independent benchmarks haven't landed yet.
Weights are live on Hugging Face under Apache-2.0. The serverless endpoint runs on Zyphra Cloud.
Bottom Line
ZAYA1-8B is the first MoE pretrained, midtrained, and SFT'd entirely on AMD's MI300x stack, with weights now on Hugging Face under Apache-2.0.
Quick Facts
- Active parameters: 760 million (8.4B total)
- Training cluster: 1,024 AMD Instinct MI300x nodes with AMD Pensando Pollara interconnect, built with IBM
- HMMT'25 score: 89.6 (company-reported); Claude 4.5 Sonnet and GPT-5-High at 88.3
- KV-cache compression: 8x via Compressed Convolutional Attention (company-reported)
- License: Apache-2.0; available on Hugging Face and Zyphra Cloud




