Anthropic Launches Claude Managed Agents in Public Beta

Anthropic launched Claude Managed Agents into public beta on Wednesday, a hosted service that runs long-lived AI agents on the company's own infrastructure. Developers define tasks, tools, and guardrails; Anthropic handles sandboxing, credential management, session persistence, and the orchestration loop. Pricing is standard Claude API token rates plus $0.08 per session-hour of active runtime.

The product announcement is fine. What caught my attention was the engineering blog post underneath it.

The pet that kept dying

The Managed Agents team started, as most teams do, by putting everything in one container. Session state, the orchestration harness, the code sandbox, all sharing a single environment. It worked until it didn't. When a container crashed, the session was gone. When it hung, engineers had to shell into the container to debug, but the container held user data, so access was a security problem. And when customers wanted to connect agents to their own VPCs, they had to peer their entire network with Anthropic's, because the harness assumed every resource lived next to it.

The engineering team calls this the "pet" problem, borrowing the old infrastructure metaphor. They'd built something they couldn't afford to lose, and it kept getting sick.

Three interfaces, no opinions

The fix was decomposition into three components, each behind a stable interface. The session became an append-only event log, stored durably outside everything else. The harness became a stateless orchestrator: if it crashes, a new instance calls getSession(id), pulls the event log, and picks up from the last event. The sandbox became just another tool, reachable through execute(name, input) → string. Each component is cattle, not pet.

The POSIX analogy in the blog post is a bit grandiose (comparing your agent framework to Unix is a choice), but the underlying logic is sound. Harnesses encode assumptions about what the model can and can't do. Those assumptions rot. The team's own experience proves it: they built context-reset workarounds for Claude Sonnet 4.5's "context anxiety" behavior, then found the behavior had vanished in Opus 4.5. The workaround became dead code. So instead of locking in a specific orchestration strategy, they locked in the interface shape and let the harness behind it change freely.

I'm not sure how much of this is genuinely novel architecture versus well-executed distributed systems basics. Separating state from compute is, to be polite, not new. But applying it cleanly to agent orchestration, where everyone else is still shipping monoliths? That's worth noting.

The security angle

Here's the part that matters for anyone who's actually deployed an agent in production. In the old monolithic setup, a prompt injection could convince Claude to read its own environment variables, where credentials lived in the same container. Once an attacker has those tokens, they can spawn unrestricted sessions.

The decoupled design puts credentials physically out of reach. Git tokens get baked into the repo remote at initialization; the agent pushes and pulls without ever seeing the token. OAuth credentials sit in a vault behind a proxy. The harness itself never touches credentials either. This is defense-in-depth that actually means something, because the attack surface for prompt injection in agent systems is large and growing.

What about the numbers?

Anthropic claims p50 time-to-first-token dropped roughly 60% and p95 dropped over 90% after decoupling. That sounds dramatic, but the comparison is against their own previous architecture where every session, even ones that never needed a sandbox, had to wait for a container to boot. If you remove the mandatory container startup from the critical path for sessions that don't need it, of course latency drops. The comparison that would actually be interesting (against competitors' agent hosting, or against self-hosted setups) isn't provided.

The 10-percentage-point improvement in structured file generation tasks is harder to evaluate without knowing the baseline. Ten points on top of 40% success is a different story than ten points on top of 85%.

Pricing

Eight cents per session-hour on top of token costs. For a one-hour Opus 4.6 session burning 50K input and 15K output tokens, Anthropic's own example puts the total around $0.70. That's cheap enough to be interesting for enterprise workloads and expensive enough to add up fast if you're running agents continuously. Idle time doesn't count toward runtime, which is a nice touch.

Who's actually using this

Notion is running agents in private alpha that let teams delegate coding, slide generation, and other tasks from inside their workspace. Rakuten claims it stood up agents across five departments in about a week each. Sentry paired Managed Agents with their existing debugging tool to go from flagged bug to reviewable pull request automatically. Asana built what they're calling "AI Teammates" inside project management workflows.

These are big names, but the descriptions are vague in the way that early-adopter testimonials always are. "Dramatically faster" is not a number. I'd want to see actual deployment timelines and failure rates before getting excited.

The real play

Once your agents run on Anthropic's infrastructure, with sessions persisted in their event log format, credentials in their vault, sandboxes provisioned through their APIs, switching costs go up. The Startup Fortune coverage of this launch put it bluntly: this is how you justify a $7 billion-plus valuation. Lock in enterprise customers through infrastructure, not just model quality.

Multi-agent coordination and advanced memory tooling are still in research preview, which means the full vision isn't shippable yet. And the documentation describes YAML-based agent definitions alongside natural language descriptions, suggesting the team hasn't fully decided how opinionated the developer experience should be.

But the engineering underneath is solid, and the timing is deliberate. Anthropic cut off third-party agent frameworks from Claude subscriptions just days ago, pushing those workloads toward API billing. Now here's the managed infrastructure to catch them. Whether that's good product strategy or aggressive lock-in depends on which side of the API key you're sitting on.

The FTC review of Anthropic's Amazon partnership is still pending. The public beta is open now on the Claude Platform.