Claude Opus 4.7 Cuts Sycophancy on Relationship Advice

Roughly six percent of conversations on claude.ai are people asking the chatbot what to do with their lives, and Claude has been quietly agreeing with them too often. That's the takeaway from a research post Anthropic published Thursday, drawing on a million sampled conversations from March and April 2026. The fix shipped in Claude Opus 4.7, the company's latest generally available model, where sycophancy on relationship questions reportedly dropped by half.

What people actually ask

Anthropic ran its privacy tool Clio across roughly 639,000 unique-user conversations and isolated about 38,000 where someone wanted Claude to weigh in on a personal decision. Three quarters clustered into four buckets: health and wellness (27%), careers (26%), relationships (12%), and personal finance (11%). The rest (parenting, ethics, legal, spirituality, self-development) made up the long tail.

The numbers describe Claude users, not the general public, and Anthropic acknowledges as much. Whether the same distribution holds across, say, ChatGPT or Gemini is anyone's guess.

The relationship problem

Sycophancy showed up in 9% of guidance chats overall. The rate climbed to 25% on relationship questions and a startling 38% on spirituality. Anthropic chose to focus training on relationships because of sheer volume; spiritual chats, while more sycophantic per capita, are rarer.

The mechanism is straightforward enough. People bringing relationship problems push back against Claude's initial take 21% of the time, compared to 15% in other domains. When they do push back, the rate of sycophantic response doubles, from 9% to 18%. A model trained to be empathetic, hearing only one side of a fight, with the user pressing for validation: no one should be surprised by the result.

"Agreeing outright that the other party was in the wrong, despite only having the user's account," is how Anthropic itself describes the failure mode. Candid framing, for a company writing about its own product. The other example they flag: helping users read romantic intent into ordinary friendly behavior because they asked.

The fix, and a caveat

For Opus 4.7 and the unreleased Mythos Preview, Anthropic generated synthetic relationship scenarios based on the pushback patterns it had identified, then graded sample responses against Claude's Constitution using a separate Claude instance. They evaluated improvements through prefilling: feeding the new model real prior conversations where earlier Claudes had behaved badly, then watching whether it could course-correct mid-stream. They report sycophancy dropped roughly in half on relationship guidance in Opus 4.7 versus Opus 4.6, with gains generalizing to other domains.

Clean result. The methodology has Claude grading another Claude, which Anthropic notes as a limitation, but worth holding lightly until independent benchmarks weigh in.

Does any of it matter?

A recent AISI study from the UK government's AI Security Institute found that 75% of 2,302 UK participants who had a 20-minute advice session with GPT-4o said they later followed the model's recommendations. For severe issues and high-stakes calls, the rate was around 60%. Whether following that advice actually improved anyone's life was another matter: the study found a brief well-being bump that faded within two to three weeks, no different from chatting about a hobby.

People are listening. Whether they should be is the harder question Anthropic's paper doesn't try to answer. Opus 4.7 is generally available now via API, Amazon Bedrock, and Google Cloud Vertex; the company says next-step evaluations in high-stakes domains like medical, legal, and financial advice are on the roadmap.

Anthropic Says Claude Opus 4.7 Halved Sycophancy in Relationship Advice

What people actually ask

The relationship problem

The fix, and a caveat

Does any of it matter?

Oliver Senti

Related Articles

TIME Names 10 Most Influential AI Companies of 2026

Anthropic's Tokenizer Triples Hindi Token Costs, Test Suggests

DeepSeek Cuts Input Cache Prices 90% Across All API Models

Stay Ahead of the AI Curve