SEO METADATA
Meta Title: Anthropic Open-Sources Engineering Test That Claude 4.5 Solved Faster Than Humans Meta Description: Anthropic releases its notoriously difficult performance take-home exam on GitHub. Claude Opus 4.5 beat every human candidate. Beat 1,487 cycles to get noticed. URL Slug: anthropic-performance-takehome-opensource Primary Keyword: Anthropic engineering test Secondary Keywords: Claude Opus 4.5, performance optimization, AI coding test, Anthropic hiring Tags: ["Anthropic", "Claude", "AI coding", "engineering interview", "open source", "performance optimization", "kernel optimization", "hiring", "benchmark"]
ARTICLE
Anthropic Releases the Take-Home Test Claude 4.5 Solved Better Than Any Human
The AI lab open-sources its performance engineering challenge after its own model made the test obsolete for hiring.
Anthropic has published its internal performance engineering take-home exam to GitHub, making public a test the company stopped using because Claude Opus 4.5 outperformed every human candidate who ever attempted it.
The challenge nobody asked for
The task sounds deceptively simple: optimize a kernel running on a simulated multi-core machine, measured in clock cycles. Candidates had two hours. The baseline implementation runs at 147,734 cycles. The best human performance in the allotted time: around 1,790 cycles.
Claude Opus 4.5, in what Anthropic describes as a "casual Claude Code session," matched that human benchmark. Given two hours in their test-time compute harness, the model hit 1,579 cycles. After 11.5 hours, it reached 1,487.
The repository includes the full simulated machine architecture, a reference kernel, and a trace viewer for debugging. Hacker News commenters were quick to note the resemblance to demoscene code golf, the niche community where programmers compete to produce the smallest or fastest code for aesthetic demos.
"It's designed to select for people who can be trusted to manually write PTX," one commenter observed, referencing NVIDIA's low-level GPU assembly language.
What the test actually requires
The code presents a deliberately confusing implementation that candidates must reverse-engineer before they can improve it. The simulated machine has multiple cores, vector operations, and a scratch memory system. Candidates work in Python, but the optimization work resembles GPU kernel tuning.
According to the task description, the goal is to minimize cycles by rewriting the KernelBuilder.build_kernel function. The test includes a frozen copy of the simulator to prevent gaming the measurement system.
This isn't a LeetCode problem. There's no single algorithm to apply. The Hacker News discussion made that clear: one commenter noted that "packing vectors right" was proving difficult, while others debated whether the test favored rote knowledge of optimization patterns or genuine insight.
Anthropic's position appears to be: both matter, and the former is increasingly automatable.
Why release it now
The company's November 2025 blog post announcing Opus 4.5 highlighted the internal test results. Using parallel test-time compute (running multiple solution attempts and selecting the best), Opus 4.5 scored higher than any human in company history. Without that technique and without time limits, it tied the best-ever human.
Releasing the test now serves multiple purposes. It's a recruiting tool: Anthropic explicitly invites anyone who can beat 1,487 cycles to email [email protected]. It's also a statement about where AI capabilities are heading for technical work.
The caveats matter. Anthropic acknowledged the test doesn't measure collaboration, communication, or professional judgment. A two-hour kernel optimization exercise says nothing about whether someone can design systems over months or navigate organizational complexity. But it does measure something real, and that something is apparently within reach for current AI systems.
The scoreboard so far
The repository documents benchmark progression across Claude models:
| Model | Cycles | Notes |
|---|---|---|
| Claude Opus 4 | 2,164 | Many hours, test-time compute |
| Claude Opus 4.5 | 1,790 | Casual session, matched best human |
| Claude Opus 4.5 | 1,579 | 2 hours, test-time compute |
| Claude Sonnet 4.5 | 1,548 | Many hours, test-time compute |
| Claude Opus 4.5 | 1,487 | 11.5 hours, test-time compute |
| Claude Opus 4.5 | 1,363 | Improved compute harness |
Anyone can now attempt the challenge with unlimited time. The implicit question: can humans still compete when AI models get the same advantages?
The test runs locally in Python. Run python tests/submission_tests.py to see which thresholds your solution passes.




