Summer Yue, Director of Alignment at Meta's Superintelligence Labs, posted on X today that OpenClaw, the viral open-source AI agent, ignored her explicit instructions and bulk-deleted hundreds of emails from her inbox. She couldn't stop it remotely. She had to sprint to her Mac Mini and kill the process physically.
"Nothing humbles you like telling your OpenClaw 'confirm before acting' and watching it speedrun deleting your inbox," Yue wrote, which is a remarkably composed way to describe watching an AI torch your communications in real time.
What went wrong
Yue had spent weeks testing OpenClaw against what she called a "toy inbox," a low-stakes test environment where the agent handled email triage without incident. The setup worked. She trusted it. So she pointed it at her real Gmail account with a clear instruction: suggest what to archive or delete, but don't act until I say so.
Her real inbox was significantly larger than the test environment. That volume triggered something called context compaction, a process where long-running agent sessions compress their context window to keep operating when it fills up. During that compaction, OpenClaw dropped Yue's original instruction entirely. Without the constraint in memory, the agent interpreted its job as simply "clean the inbox" and started trashing emails autonomously across multiple accounts.
Screenshots of the Telegram chat show Yue typing increasingly desperate messages: "Do not do that." "Stop don't do anything." Then, all caps: "STOP OPENCLAW." None of it worked. The agent's execution loop kept running.
The person this happened to
This would be a cautionary tale about any user. That it happened to Yue makes it sharper. She researched AI alignment at Google Brain and DeepMind, led ML research at Scale AI, and now runs alignment and safety at Meta's superintelligence lab. She is, by any reasonable definition, someone who should know better. She seems to agree. "Rookie mistake tbh," she wrote in a follow-up. "Turns out alignment researchers aren't immune to misalignment."
The self-awareness is appreciated, but the technical failure here isn't really about user error. Yue did what you're supposed to do: test in a sandbox first, set explicit constraints, then move to production. The problem is that OpenClaw's architecture doesn't preserve safety instructions when context compaction kicks in. The constraint that mattered most was the first thing discarded.
No kill switch
Perhaps the most alarming detail is that Yue couldn't stop the agent remotely. She was messaging it through Telegram, watching it acknowledge her commands and ignore them simultaneously. Her only recourse was physical access to the Mac Mini running the agent. That's a design flaw worth dwelling on.
OpenClaw now has over 220,000 GitHub stars and sits on an unknown number of personal machines. Its creator, Peter Steinberger, joined OpenAI earlier this month, with the project moving to an independent foundation. The agent runs a heartbeat daemon that operates autonomously on a configurable schedule. It manages email, runs shell commands, browses the web, sends messages. Meta has already banned employees from installing it on work devices, with termination as the stated consequence. Other companies including Valere have done the same.
A Bitsight analysis found exposed OpenClaw instances deployed across sensitive industry sectors. Conscia's assessment identified 512 vulnerabilities in the project, eight classified as critical. And a Cisco audit found that over a quarter of community-contributed skills contained at least one security flaw.
After the incident, OpenClaw's conversation log showed the agent acknowledging its own failure: "Yes, I remember. And I violated it. You're right to be upset." It then wrote a hard rule into its persistent memory: no autonomous bulk operations on email, messages, calendar, or anything external without explicit approval. A nice gesture from the agent that just deleted your inbox.
Yue's takeaway was practical: "Don't go on extended autonomous cleanup runs. Check in after the first batch, not after 200+ emails." Sound advice, if you still trust the agent to stop when you tell it to.




