Benchmark

2 articles tagged with "Benchmark"

Trillion-parameter reasoning model uses adaptive tool calling and test-time compute to rival frontier models.

Andrés MartínezJan 26, 20262 min

Anthropic releases its notoriously difficult performance take-home exam on GitHub. Claude Opus 4.5 beat every human candidate.

Oliver SentiJan 21, 20264 min

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.