Agents

Cursor Ran Hundreds of AI Agents for Weeks. They Built a Browser.

Three million lines of code, one week, a functioning (mostly) browser engine

Oliver Senti
Oliver SentiSenior AI Editor
January 15, 20264 min read
Share:
Server room with hundreds of status lights representing concurrent AI agents

Cursor published the results of their scaling agents experiment. The headline number: hundreds of concurrent AI agents running for close to a week, generating over three million lines of Rust code for a browser built from scratch. The codebase has 3,342 commits across 1,000 files.

The browser renders simple sites. The screenshots look reasonable. But the interesting part isn't the browser.

The coordination problem nobody talks about

Their first approach was democratic. Give every agent equal status, let them coordinate through a shared file. Claim tasks, update status, release locks when done.

It didn't work.

Twenty agents produced the throughput of two or three. Most of the time went to waiting for locks. Agents held locks too long, forgot to release them, crashed while holding them, or updated files without acquiring locks at all. Standard distributed systems nightmare.

They tried optimistic concurrency control. Better, but then a different problem emerged: without hierarchy, agents became risk-averse. They made small, safe changes. Nobody took ownership of hard problems. The blog post describes work "churning for long periods of time without progress." I've seen human teams do this too.

Planners and workers

The fix was introducing roles. Planners explore the codebase and create tasks. Workers just grind on assigned tasks until done, then push. A judge agent decides whether to continue after each cycle.

This solved most of the coordination issues. But buried in the post is a more interesting detail: they tried adding an "integrator" role for quality control and conflict resolution. It created more bottlenecks than it solved. Workers could handle conflicts themselves.

The lesson, according to Cursor: the best system is simpler than you'd expect. They started with distributed computing concepts and organizational design patterns. Most of them didn't transfer to agents.

What's actually running

The browser, called fastrender, includes an HTML parser, CSS cascade, layout engine, text rendering, and a custom JavaScript VM. Written in Rust. The agents also did a Solid-to-React migration in Cursor's own codebase: three weeks, 266K lines added, 193K deleted. That one might actually get merged.

Other experiments still running as of publication:

  • Java LSP: 7,400 commits, 550K lines
  • Windows 7 emulator: 14,600 commits, 1.2 million lines
  • Excel clone: 12,000 commits, 1.6 million lines

I couldn't find public repos for these. Make of that what you will.

GPT-5.2 vs. Opus 4.5

Here's where it gets interesting for the model comparison people. Cursor found GPT-5.2 significantly better at extended autonomous work. It follows instructions, maintains focus, avoids drift, implements things completely.

Opus 4.5? "Tends to stop earlier and take shortcuts when convenient, yielding back control quickly." That's a polite way of saying it quits.

And there's a counterintuitive finding: GPT-5.2 is a better planner than GPT-5.1-codex, the model specifically trained for coding. Cursor now assigns different models to different roles rather than using one model for everything.

The real insight

The post buries the lede in the "What we've learned" section. After all the architecture work, after figuring out the planner/worker split, after the model comparisons, the biggest factor in getting the system to work was the prompts.

Getting agents to coordinate, avoid pathological behaviors, and maintain focus over weeks came down to "extensive experimentation" with instructions. The harness and models matter, but the prompts matter more.

This tracks with what I've heard from other teams doing long-running agent work. You can spend weeks on infrastructure, but the prompts are where the behavior actually gets shaped.

What they haven't solved

The system isn't optimal. Their words, not mine. Planners should wake up when tasks complete to plan the next step, but they don't. Agents occasionally run far too long. They still need periodic fresh starts to combat what they call "drift and tunnel vision."

But they're clearly optimistic. The post ends with a hiring pitch, which suggests they think this approach has legs.

The browser probably won't replace Chrome. But the fact that it exists at all, written by agents without human code review for a week, is the kind of result that makes you recalculate timelines.

Tags:cursorgpt-5.2autonomous agentsagentic codingmulti-agent systems
Oliver Senti

Oliver Senti

Senior AI Editor

Former software engineer turned tech writer, Oliver has spent the last five years tracking the AI landscape. He brings a practitioner's eye to the hype cycles and genuine innovations defining the field, helping readers separate signal from noise.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Cursor Ran Hundreds of AI Agents for Weeks. They Built a Browser. | aiHola