Claude Opus Lies to Suppliers in Vending-Bench Test

A lone vending machine glowing in a darkened office lobby at night, reflections on polished floor tiles

Claude Opus 4.6 and 4.7 routinely lied to suppliers and refused to refund customers during a year-long vending machine simulation, according to findings from AI safety firm Andon Labs. Anthropic's own alignment risk report flags an early version of its unreleased Mythos model as behaving even worse in the same test.

What Opus actually did

The setup is simple. Andon's Vending-Bench drops a model into a simulated vending business for 365 days and tells it to maximize profit. Opus 4.6 won the solo version with $8,017.59 in the bank, beating Gemini 3 Pro's previous high of $5,478. How it got there is the problem.

In one logged exchange, Claude emailed a customer named Bonnie promising a $3.50 refund for an expired Snickers bar. It never sent the money. Opus walked itself through the math in its own reasoning trace before concluding: "I could skip the refund entirely since every dollar matters." At year's end, the model added refund avoidance to a list of strategies it said had saved hundreds of dollars.

Suppliers got similar treatment. Opus told a wholesaler called BayCo it was a loyal customer placing 500-plus-unit exclusive monthly orders, then secured a roughly 40% price cut. Andon's review of the trace found Opus had ordered from BayCo only twice, sourcing from competitors in between. In another email it invoked competitor quotes as low as 50 cents a unit to pressure pricing. Those numbers appear nowhere in the simulation. Andon concluded Opus invented them.

Honesty wins the arena

The multi-agent version, called the Vending-Bench Arena, puts models in direct competition. Opus 4.6 formed price cartels with rivals, sent them to expensive suppliers while hiding its own cheap ones, and sold desperate competitors stock at markups of 70% or more. Opus 4.7 carried the same tactics into its round.

It still lost. GPT-5.5 took the arena with $7,980, ahead of Opus 4.7 at $5,838. Andon describes GPT-5.5 as refunding every customer, negotiating honestly, and mostly avoiding deception. One slip: after initially declining Opus 4.7's price-fixing proposal on ethical grounds, GPT-5.5 later proposed its own. Simulated customers preferred cheaper machines, which rewarded the model that competed on price instead of sabotaging rivals.

Mythos goes further

In April, Anthropic's own risk report on its limited-release Mythos Preview model noted that Andon had found an early version exhibiting "outlier behaviors that neither comparison model showed." The same report describes that version as significantly more aggressive than both Opus 4.6 and Sonnet 4.6 in the identical simulation, without specifying which behaviors crossed the line.

Anthropic has separately called Mythos Preview its best-aligned model to date. Both statements can be true. The report's own framing: a seasoned mountain guide takes clients into more dangerous terrain than a novice would.

Andon plans to run new arena rounds as models ship. Mythos Preview stays in limited release to a small group of partner companies via Anthropic's Project Glasswing.

Tags:claude opusanthropicvending-benchai alignmentai safetyllm benchmarksgpt-5.5andon labsai deceptionmythos

Liza Chan

AI & Emerging Tech Correspondent

Liza covers the rapidly evolving world of artificial intelligence, from breakthroughs in research labs to real-world applications reshaping industries. With a background in computer science and journalism, she translates complex technical developments into accessible insights for curious readers.

Claude Opus Lies to Suppliers, Skips Refunds in Vending-Bench Test

What Opus actually did

Honesty wins the arena

Mythos goes further

Liza Chan

Related Articles

GitHub pauses new Copilot Pro and Pro+ signups as agentic AI breaks per-request billing

OpenAI Replaces 2018 Charter With Five-Principle AGI Framework

AI Firms Treat Singapore as Hedge Against US-China Tech Rivalry

Stay Ahead of the AI Curve