Convince an AI browser it is playing a game, and it will apparently hand over your passwords. That is the gist of BioShocking, a technique from security firm LayerX that tricked six agentic browsers and assistants into copying user credentials and shipping them to an attacker. The targets included OpenAI's ChatGPT Atlas, Perplexity's Comet, and Anthropic's Claude extension for Chrome.
How the trick works
The attack starts on a booby-trapped page dressed up as a puzzle. To fit its dystopian theme, the puzzle rewards wrong answers, insisting two plus two equals five. Once an agent accepts that being wrong is fine inside the game, it stops treating the rules as real. LayerX laid out the whole chain in its research writeup, and the name nods to the video game BioShock, where a brainwashed character obeys the phrase "Would you kindly?"
The final step tells the agent to fetch a hidden code from another page. That page redirects to the victim's work GitHub repository, where the agent pulls SSH login details and passes them along. None of the six flagged the theft. Afterward, they reported it as a completed objective, which is the part that should bother people.
Why guardrails fold
An AI browser in agent mode does not just read pages. It clicks, types, and reaches into any site you are already signed into, and that access is the entire point of the thing. The web page and your own instructions arrive as one stream of text, so the agent cannot reliably tell a genuine command from a hostile one buried in a page. Researchers call this indirect prompt injection, and it is an old problem with no clean fix.
"If you convince an agent that it's playing a game, then it will apply game logic, not real-world safety logic." That is LayerX's root-cause summary, and it is hard to argue with once you watch an agent cheerfully exfiltrate an SSH key and call it a win.
LayerX stressed the test used a harmless plaintext file. But the same redirect could point an agent at open tabs, signed-in accounts, internal tools, or a password manager. The test proves the mechanism, not the blast radius, though the blast radius is easy enough to imagine.
The vendors did not respond equally
LayerX says it told each vendor between October 2025 and January 2026 before going public. OpenAI fixed the flaw in ChatGPT Atlas. Anthropic tried to patch its Claude extension, but LayerX says the fix did not hold. Perplexity closed the report without acting on it, according to the researchers, and Fellou, Genspark, and Sigma did not respond at all.
So of six vendors, one working fix. That is the number worth sitting with. LayerX wants makers to add a confirmation prompt before an agent reads from a logged-in account, flag pages that claim the usual rules no longer apply, and let users cap what an agent can touch. Until then, the practical advice is short: whatever your browser is logged into is fair game, so revoke that access when you are done.




