Essential AI just dropped Rnj-1, an 8 billion parameter open-weights model built for code and STEM tasks. The startup was founded by Ashish Vaswani, co-author of the "Attention Is All You Need" paper that introduced the Transformer architecture. The model's name pays homage to mathematician Srinivasa Ramanujan.
The headline number: Rnj-1 Instruct scores 20.8% on SWE-bench Verified in bash-only mode. That puts it ahead of Gemini 2.0 Flash and Qwen2.5-Coder 32B Instruct under the same agentic framework. For context, Qwen 3 8B (with thinking disabled) manages just 4.5% on the same benchmark.
Essential AI took a contrarian approach to training. The team deliberately minimized post-training and reinforcement learning, betting instead on quality pre-training. Rnj-1 trained on 8.4 trillion tokens using the Muon optimizer rather than the standard AdamW. The architecture follows Gemma 3 with modifications: global self-attention and YaRN extend context to 32K tokens.
Both base and instruct versions ship under Apache 2.0 license. Weights are live on Hugging Face, with inference available through Together AI. The company says it kept post-training minimal so the community can specialize the model further.
The Bottom Line: An 8B model matching or beating 32B competitors on agentic coding tasks suggests pre-training quality matters more than parameter count.
QUICK FACTS
- Model size: 8.3 billion parameters
- Training data: 8.4 trillion tokens (pre-training) + 380B (context extension) + 150B (SFT)
- SWE-bench Verified score: 20.8% (bash-only)
- Qwen 3 8B comparison: 4.5%
- Context length: 32K tokens
- License: Apache 2.0




