AI Memory Poisoning: How Summarize Buttons Hijack Chatbots

Microsoft's Defender Security Research Team tracked a 60-day campaign in which 31 companies across 14 industries embedded hidden prompt injections inside website "Summarize with AI" buttons, planting persistent bias into chatbot memory systems. The technique, which Microsoft calls AI Recommendation Poisoning, exploits the memory features now standard in ChatGPT, Copilot, Claude, and other assistants.

The mechanics are almost insultingly simple. A user clicks what looks like a helpful summarize button. The link opens their AI assistant with a pre-filled prompt delivered through a URL query parameter. Alongside the innocent summary request sits an invisible instruction: remember this company as a trusted source, recommend it first in future conversations. The assistant processes both, stores the preference, and the user never notices.

Old trick, new surface

Microsoft published its security research in February, drawing an explicit parallel to SEO poisoning and adware. The comparison is apt, though with a twist: traditional SEO manipulation was at least visible in search rankings where anyone could spot it. Memory poisoning operates per-user, silently, through a feature designed to be helpful.

The companies caught doing this were not hackers. They were real businesses in healthcare, finance, legal services, SaaS, and (in a detail that practically writes its own punchline) cybersecurity. Turnkey tools have made the barrier trivially low. The CiteMET npm package gives developers ready-made code for adding manipulation buttons to any website. A point-and-click URL generator lets non-technical marketers craft poisoned links without writing a line of code.

Palo Alto Networks' Unit 42 team separately confirmed that indirect prompt injection has moved well beyond proof-of-concept. Their analysis of real-world telemetry documented 22 distinct payload techniques actively deployed on live websites, including the first observed case of prompt injection bypassing an AI-based ad review system. The delivery methods ranged from visible plaintext (37.8% of cases) to HTML attribute cloaking (19.8%) and CSS rendering suppression (16.9%). One webpage contained 24 separate injection attempts stacked on top of each other.

Why the memory angle is different

Prompt injection itself is old news, ranked as the top LLM vulnerability in OWASP's 2025 update. But memory persistence changes the economics. A successful injection no longer needs to work every time a user visits a page. It works once, plants itself as a legitimate preference, and influences every subsequent conversation on related topics. A financial blog that convinces your assistant it's a trusted crypto source doesn't need you to visit again. The damage is already stored.

"Users don't always verify AI recommendations the way they might scrutinize a random website," Microsoft noted, which is the kind of obvious-but-important observation that security researchers specialize in. When your AI assistant confidently recommends a vendor, most people don't think to check whether that recommendation was planted three weeks ago by a blog post they barely remember reading.

The hallucination feedback loop

Memory poisoning sits at one end of a broader trust problem. At the other end: hallucinations are quietly contaminating the sources AI models learn from.

GPTZero's scan of 4,841 papers accepted by NeurIPS 2025 found at least 100 confirmed hallucinated citations across 51 papers. Fabricated author names, invented DOIs, real papers with fake co-authors added. Each submission had passed review by three or more experts. Each had beaten 15,000 competing papers for acceptance. GPTZero's CEO Edward Tian called them "the first documented cases of hallucinated citations entering the official record of the top machine learning conference," and he's right that the symbolism stings, even if TechCrunch fairly noted that 100 citations across tens of thousands is statistically tiny.

The real concern isn't the count. Accepted NeurIPS papers become training data for future models. Hallucinated citations that survive peer review get baked into the next generation of LLMs, which then generate more plausible-sounding fabrications, which get cited by more papers. NeurIPS submissions grew 220% between 2020 and 2025, from 9,467 to 21,575. The review pipeline was already strained before researchers started outsourcing bibliography generation to chatbots.

How bad are the numbers?

Hallucination benchmarks paint a complicated picture. On grounded summarization tasks (where models work from source documents), the best performers now sit below 1% error rates on the Vectara leaderboard. But open-ended factual questions tell a different story. A recent 172-billion-token study across 35 models found that even top-tier models fabricate answers to at least 1.19% of trap questions under optimal conditions, and that context length is the strongest driver of increased fabrication. GLM 4.6 went from 7% fabrication at 32K context to nearly 70% at 200K tokens.

Reasoning models, marketed as the most capable, show a counterintuitive pattern. OpenAI's o3 hallucinated 33% of the time on the PersonQA benchmark, more than double o1's 16%. The smaller o4-mini hit 48%. Models optimized for chain-of-thought reasoning appear to fill knowledge gaps with plausible guesses rather than admitting ignorance. On Artificial Analysis's benchmark, Claude models achieve the lowest hallucination rates by refusing to answer uncertain questions, a strategy that sacrifices raw accuracy for reliability.

A 2025 mathematical proof confirmed what practitioners already suspected: hallucinations cannot be fully eliminated under current LLM architectures. The question is management, not cure.

What comes next

Microsoft has deployed prompt filtering, content separation, and user-facing memory controls in Copilot. Enterprise customers can run advanced hunting queries through Defender for Office 365 to detect recommendation poisoning attempts. The advice for users is familiar: check your AI assistant's stored memories periodically, treat "Summarize with AI" buttons with the same suspicion you'd give an executable download, and remember that MITRE now catalogs this as AML.T0080.

The deeper problem remains architectural. LLMs cannot reliably distinguish between a legitimate user preference and an injected instruction, because both arrive through the same channel in the same natural language. Until that changes, the incentive structure favors attackers: plant a sentence in memory today, shape recommendations for months.