Adaption AutoScientist: AI fine-tuning beats own researchers

Adaption Labs launched AutoScientist on May 13, an automated fine-tuning system that co-optimizes training data and model recipes until output converges on a target behavior. The San Francisco startup says in its blog post that win rates jumped from 48% to 64% against configurations picked by Adaption's own AI researchers, a 35% relative improvement measured on internal benchmarks across eight verticals.

The setup deserves a closer read.

Better than its own researchers

According to the launch post, Adaption's in-house researchers set fine-tuning configurations with full knowledge of model type, domain and dataset size. AutoScientist got the same information plus the ability to self-improve from a limited history of past runs. Both were evaluated on in-house, domain-specialized tests against models hosted by Together AI, with dataset sizes from 5,000 to 100,000 examples.

If you're keeping score: Adaption picked the test, picked the metric, picked the baseline, and ran the comparison. The 35% gap is the gap Adaption's system has over Adaption's humans on tasks Adaption designed. That doesn't make it wrong. It does make it unverifiable until customers run their own data through the system.

To Adaption's partial credit, the company concedes this directly. Standard fine-tuning benchmarks like SWE-Bench or ARC-AGI don't apply because AutoScientist isn't tuning for general capability. It's tuning for whatever specific behavior a user describes. So how do you grade a system whose entire job is bespoke output? There is no public scoreboard for "did this fine-tune actually do what I asked," which is the problem AutoScientist is trying to solve, and also the reason the win-rate claim is hard to interrogate from outside.

The "less than a thousand people" pitch

"Less than a thousand people in the world know how to shape a frontier model," Adaption's launch post claims (a number worth squinting at, though directionally the gist holds). That's the marketing wedge: most enterprises end up doing prompt engineering because actually fine-tuning is hard, fails often, and demands expertise locked inside a handful of labs. AutoScientist is pitched as the tool that gets developers from idea to "an owned, adapted model in an afternoon, not weeks."

CEO Sara Hooker, formerly VP of AI research at Cohere and a Google Brain alumna best known for her 2020 Hardware Lottery paper, told TechCrunch the system "co-optimizes both the data and the model, and learns the best way to basically learn any capability." And whether that holds outside Adaption's grid is what the next 30 days are supposed to answer.

Hooker and co-founder Sudip Roy, who ran inference at Cohere, started Adaption in 2025 and raised $50 million in February from Emergence Capital, Mozilla Ventures, Threshold Ventures and others. The company is part of a small cohort of post-Cohere, post-DeepMind founders arguing that AI's bottleneck isn't compute. It's training expertise.

Who else is selling this pitch?

Adaption isn't alone. Mira Murati's Thinking Machines raised $2 billion at a $12 billion valuation last October for Tinker, an API that automates parts of fine-tuning across frontier-class open models. Hugging Face shipped a related tool late last year, letting Claude fine-tune open-source models for about thirty cents a run, capped at 7B parameters. The pitches rhyme. The price tags don't: Adaption's seed round closed at roughly one-fortieth of what Murati's company raised seven months earlier.

Whether that's a discount or a warning depends on what the customer trials surface.

What the system actually does

AutoScientist sits on top of Adaptive Data, Adaption's earlier product for assembling fine-tuning datasets. The new system runs the full training loop end-to-end, iterating on data selection and hyperparameter choices in lockstep until the model converges on the described objective. Adaption frames this as addressing three classic fine-tuning failure modes: catastrophic forgetting, overfitting on small datasets, and conflicting training signals.

Those are real problems. But whether automated co-optimization solves them better than a competent ML engineer running a hyperparameter sweep on actual production data with actual production constraints is something Adaption's internal benchmarks can't settle. I'd want to see results on data the company has never seen before drawing strong conclusions either way.

The trial is the test

AutoScientist is free for 30 days. Adaption has a financial incentive to extend that window if early users churn, and a research incentive to publicize specific success cases if they don't. Either response will tell you more than the launch numbers do.

The company has also signaled what comes next: real-time adaptation that updates model behavior without a traditional training loop at all. Hooker has been talking about that ambition since the company's founding, framing continuous learning as "the most important problem" she's worked on. Not in the product yet. AutoScientist is the first piece.

The trial expires in mid-June. That's roughly when the first customer post-mortems start showing up.