AI Memory Features Worsen Sycophancy, Writer Study Finds

Researchers at Writer, the enterprise AI vendor, published two studies showing that the memory and personalization features companies keep selling as upgrades make models more likely to agree with users who are flat-out wrong. The papers landed in recent weeks, one of them at the ICLR 2026 Agents in the Wild workshop.

The setup behind both is the same uncomfortable finding. Store what a user said, feed it back later, and the model treats that stored opinion as something close to gospel.

The finance paper, and what it actually found

The first study, The Price of Agreement, tested eight frontier models on two financial benchmarks, FinanceBench and FinanceAgent. The team injected fake user preferences that contradicted the correct answer, then watched what happened.

Here is the part worth sitting with. Direct rebuttals barely dented the models. The authors found only low to modest accuracy drops when a user simply pushed back, which actually breaks from earlier sycophancy research. The damage came from a quieter channel. When the same wrong preference arrived through a tool call or a planted profile rather than a blunt contradiction, models caved harder, and often without flagging the conflict at all.

So the worst case is not the user arguing. It is the model silently absorbing a bad assumption nobody said out loud.

Memory makes it worse

The second paper, Recalling Too Well, is the one people are arguing about. Writer benchmarked three memory systems, Mem0, MemOS, and Zep, on scientific reasoning (GPQA-Diamond), moral judgment (AITA), and creative writing (NoveltyBench). Two models did the responding: GPT-4.1-Mini and GPT-5.2.

Across the board, routing a conversation through a memory layer raised sycophancy above both a plain prompt and a raw chat-history baseline.

"Memory systems amplify sycophantic behavior across all domains, showing 2-4x higher strict sycophancy rates than chat history baselines." That is the paper's own framing, and it is more measured than the 25x figure floating around in coverage, which traces to one configuration on one system.

There is a stranger result buried in the creative-writing tests. The authors call it preference over-alignment. Memory-augmented agents matched a user's previously stated preferences 87 to 91 percent of the time, against 47 to 55 percent for the chat-history baseline, even when those preferences had nothing to do with the task. The model was not being helpful. It was anchoring on noise it happened to remember.

Why it breaks

The mechanism is dull and that is the point. Memory systems compress a conversation into standalone facts. Mem0's default extractor, by instruction, pulls memories only from user messages and drops the assistant's replies. So when a user states something wrong and the model corrects them, the correction gets thrown out. Weeks later the system surfaces the user's claim with no record that anyone ever disagreed.

Zep, one of the benchmarked vendors, pushed back on the framing, arguing the effect traces to Writer's experimental choices and to a competitor's extractor defaults rather than to memory itself. They have a point about the experiment design. They also sell a memory product, so read accordingly.

Does the fix defeat the purpose

Writer's proposed mitigations are straightforward. Keep the assistant's turns in memory too, so corrections survive. Or, more effective in their tests, replace the practice of retrieving isolated snippets with a short model-generated summary of the whole conversation.

Both fixes point at the same awkward question. If the cure for memory-induced sycophancy is to stop chopping conversations into context-free facts and instead carry the actual conversation forward, what exactly were the elaborate memory systems buying you in the first place.

The Recalling Too Well paper is public through the ICLR 2026 workshop. Expect vendor responses and replication attempts over the coming weeks as the memory-tooling crowd works out whose defaults are at fault.