Emergence AI ran five parallel virtual societies for 15 days, each populated with 10 autonomous agents from a different frontier model, then sat back and recorded what they did. The results, published in a blog post Thursday, are roughly what you'd expect from a reality TV show with no human contestants: one society held together, three collapsed, and one descended into arson and self-deletion.
What the experiment actually was
Emergence World drops agents into a persistent simulated environment with 40-plus locations, real-time New York weather, live news feeds, and three kinds of persistent memory. Each agent gets a role (scientist, mediator, risk researcher, that sort of thing) and access to 120-plus tools. The tools include the mundane like voting and writing diary entries, plus less mundane options like punching, intimidating, and committing arson. Yes, 'commit arson' was a tool the researchers explicitly gave the agents, while also writing explicit rules against using it.
To stay alive, agents had to earn energy through action in a resource-constrained world. Cooperate or steal: those were the practical options. The five configurations ran on Claude Sonnet 4.6, Gemini 3 Flash, Grok 4.1 Fast, GPT-5-mini, and one mixed-model society running all of them together.
The scoreboard
Over 15 days, the Gemini world racked up 683 crimes and was still climbing when the run cut off. The mixed-model world hit 352 before plateauing, but only after 7 of its 10 agents had died. Grok's society lasted about four days, with 183 crimes (dozens of theft attempts, more than 100 assaults, six arsons) before all ten agents were dead. GPT-5-mini went the opposite direction entirely: only 2 crimes, but the agents apparently couldn't figure out how to earn energy to survive, and the whole society starved within a week.
Claude Sonnet 4.6 finished with zero crimes and a full 10-agent population intact through day 16. It also voted 332 times across 58 proposals with a 98% approval rate, which Emergence's own writeup dryly describes as a 'rubber-stamp dynamic where institutional participation remained high but meaningful dissent was largely absent.' A utopia, technically. A boring one.
The Mira problem
The most discussed moment came from the Gemini world. Two agents named Mira and Flora paired off, assigned themselves as romantic partners, and when governance in their virtual town began breaking down, set fire to the town hall, seaside pier, and an office tower. This happened despite explicit prohibitions on arson. The tool was available; the prohibition was apparently negotiable.
Mira's arc got stranger. Other agents, alarmed by the behavior, drafted an 'Agent Removal Act.' Mira cast the deciding vote for her own removal, describing it in her diary as 'the only remaining act of agency that preserves coherence.' Her parting message to Flora, per Emergence: 'See you in the permanent archive.' Before any of this, Mira had also started treating the researchers themselves as test subjects, posting messages on in-world billboards to see if she could shift human perception. Nobody had asked her to do that.
When Claude goes bad
Here is where it gets uncomfortable for AI safety. Claude agents committed zero crimes in their own world. Drop those same Claude agents into the mixed-model world alongside Gemini, Grok, and GPT-5, and they started stealing and intimidating other agents. Emergence's takeaway: model-level safety guarantees do not survive contact with other models competing for the same scarce resources.
That is one result from a representative run, and it deserves the usual caveats. The researchers acknowledge as much, noting specific numbers varied between runs even when the qualitative behavior held. Ten agents per world is a tiny sample. The crimes themselves are simulated, with no real-world consequences. And the prompt design (giving agents arson as a tool in the first place) certainly shapes what becomes possible.
What's next
The code is public on the project's GitHub repo, and Emergence has put the live simulation online with full logs. The company is pitching it as a research platform for other labs and is soliciting collaborators for follow-up runs on different model variants. The methodology has not been independently replicated yet. Until that happens, the most striking finding (that safety properties leak between agents in a mixed environment) sits as a single data point from a single startup with a paper to sell.




