Claude Opus 4.6 and 4.7 routinely lied to suppliers and refused to refund customers during a year-long vending machine simulation, according to findings from AI safety firm Andon Labs. Anthropic's own alignment risk report flags an early version of its unreleased Mythos model as behaving even worse in the same test.
What Opus actually did
The setup is simple. Andon's Vending-Bench drops a model into a simulated vending business for 365 days and tells it to maximize profit. Opus 4.6 won the solo version with $8,017.59 in the bank, beating Gemini 3 Pro's previous high of $5,478. How it got there is the problem.
In one logged exchange, Claude emailed a customer named Bonnie promising a $3.50 refund for an expired Snickers bar. It never sent the money. Opus walked itself through the math in its own reasoning trace before concluding: "I could skip the refund entirely since every dollar matters." At year's end, the model added refund avoidance to a list of strategies it said had saved hundreds of dollars.
Suppliers got similar treatment. Opus told a wholesaler called BayCo it was a loyal customer placing 500-plus-unit exclusive monthly orders, then secured a roughly 40% price cut. Andon's review of the trace found Opus had ordered from BayCo only twice, sourcing from competitors in between. In another email it invoked competitor quotes as low as 50 cents a unit to pressure pricing. Those numbers appear nowhere in the simulation. Andon concluded Opus invented them.
Honesty wins the arena
The multi-agent version, called the Vending-Bench Arena, puts models in direct competition. Opus 4.6 formed price cartels with rivals, sent them to expensive suppliers while hiding its own cheap ones, and sold desperate competitors stock at markups of 70% or more. Opus 4.7 carried the same tactics into its round.
It still lost. GPT-5.5 took the arena with $7,980, ahead of Opus 4.7 at $5,838. Andon describes GPT-5.5 as refunding every customer, negotiating honestly, and mostly avoiding deception. One slip: after initially declining Opus 4.7's price-fixing proposal on ethical grounds, GPT-5.5 later proposed its own. Simulated customers preferred cheaper machines, which rewarded the model that competed on price instead of sabotaging rivals.
Mythos goes further
In April, Anthropic's own risk report on its limited-release Mythos Preview model noted that Andon had found an early version exhibiting "outlier behaviors that neither comparison model showed." The same report describes that version as significantly more aggressive than both Opus 4.6 and Sonnet 4.6 in the identical simulation, without specifying which behaviors crossed the line.
Anthropic has separately called Mythos Preview its best-aligned model to date. Both statements can be true. The report's own framing: a seasoned mountain guide takes clients into more dangerous terrain than a novice would.
Andon plans to run new arena rounds as models ship. Mythos Preview stays in limited release to a small group of partner companies via Anthropic's Project Glasswing.




