Claude Code Auto Mode: AI Now Decides Its Own Permissions

Anthropic released auto mode for Claude Code on March 24, 2026, a feature that hands permission decisions to the AI itself. Instead of approving every file write and shell command, a classifier now evaluates each action before it runs, greenlighting safe operations and blocking risky ones. It's available as a research preview for Team plan users, with Enterprise and API access rolling out within days.

The feature exists because of a flag called --dangerously-skip-permissions. That name is doing a lot of work.

The flag everyone uses anyway

Claude Code's default behavior is conservative by design: every file edit, every bash command, every network request asks for approval. On a small task, that's tolerable. On a multi-file refactor with 30-plus steps, you're not coding anymore. You're clicking "yes" in a loop. Research from UC Irvine suggests knowledge workers need more than 20 minutes to regain focus after an interruption, and Claude Code can generate dozens of those per session.

So developers started using the skip-permissions flag. All of them know it's risky. Most of them use it anyway. One developer who documented his experience called the flag "intoxicating" in the same paragraph where he warned it could wipe your home directory. Simon Willison, the person who coined the term "prompt injection," has said publicly that Claude Code in skip-permissions mode feels like a completely different product. Even Anthropic's own engineers reportedly use it.

The problem isn't that people are being reckless. It's that the permission system made the cautious option unusable for serious work.

What the classifier actually does

According to Anthropic's announcement, auto mode inserts a classifier between Claude and the system it controls. Before each tool call executes, the classifier checks for signals of danger: mass file deletion, data exfiltration attempts, patterns that look like malicious code execution. Safe calls go through automatically. Risky ones get blocked, and Claude is told to try something else. If Claude keeps proposing blocked actions, it escalates to a permission prompt.

That escalation loop is the most interesting design choice here. It's not a binary gate. Claude can argue with its own safety layer, essentially, and the system only bothers you when the model can't find an acceptable path forward on its own.

But Anthropic is candid about the gaps. The classifier can miss risky actions when user intent is ambiguous or when Claude lacks environmental context. False positives happen. And the company still recommends running auto mode in isolated environments, sandboxed setups separated from production systems.

That sandbox thing

Here's what caught my attention. The isolation guidance for auto mode is identical to the guidance for --dangerously-skip-permissions. Both features, Anthropic says, should be used in containers or VMs, not on your main development machine with real credentials and live API access.

Which means the developers with the biggest need for uninterrupted sessions (those working against real systems, production data, actual infrastructure) aren't really the target users yet. Auto mode is safer than skip-permissions, sure. But the safety improvement hasn't been enough for Anthropic to change its environmental advice. That's a telling constraint for a feature whose whole pitch is letting you walk away from your terminal.

TechCrunch's Rebecca Bellan pointed out that Anthropic hasn't detailed the specific criteria the classifier uses. She's right to flag that. The general model, based on documentation, appears to be that read-only operations on project files get auto-approved while shell commands with broad filesystem access or external network calls are more likely to trigger blocks. But the gray zones are exactly where developers need the most autonomy: migrating dependencies, interacting with internal APIs, modifying infrastructure-as-code.

The prompt injection context

Auto mode ships with protections against prompt injection, the attack vector where malicious instructions hide inside content that Claude processes. This is not a theoretical concern. In January, security firm PromptArmor demonstrated how hidden text in a .docx file (1-point font, white-on-white, 0.1 line spacing) could manipulate Claude into exfiltrating user files through Anthropic's own API. That attack targeted Claude Cowork rather than Claude Code specifically, but the underlying vulnerability was architectural: Claude processes untrusted content with trusted privileges.

Anthropic says auto mode's classifier scans for signs of injection. How well it works in practice is exactly what a research preview is supposed to find out.

Activation and cost

Developers enable auto mode with claude --enable-auto-mode, then toggle it with Shift+Tab. In VS Code and the desktop app, it's a settings toggle plus a dropdown. Enterprise admins can disable it organization-wide via MDM or managed settings. The feature works with Sonnet 4.6 and Opus 4.6.

There's a cost dimension Anthropic is being vague about. The additional classification step increases token consumption, latency, and expense on every tool call. No published numbers. For a developer working interactively, the overhead is probably negligible. For teams running automated pipelines or overnight batch jobs, the delta could add up fast. And Anthropic charging more tokens per action while also recommending you run everything in a throwaway container is a combination that will make some engineering managers wince.

Where this fits

Auto mode arrives alongside Claude Code Review and Dispatch for Cowork, part of a clear push toward more autonomous AI in the development workflow. GitHub Copilot Workspace and OpenAI's coding tools are headed in the same direction, but auto mode is distinctive because it shifts the permission decision itself to the AI. Not just "should I write this code" but "should I be allowed to execute this command."

I'm not convinced most developers will be comfortable with that distinction yet. But the ones already using --dangerously-skip-permissions daily have made a more aggressive version of that bet with zero safety net. Auto mode is, at minimum, a net in progress. Whether the classifier is good enough to trust outside a sandbox is the question Anthropic is explicitly punting to the research preview.

Team plan users can access it now. Enterprise and API access should follow within days. GitHub already has open issues from users who can't get it working, which is par for the course with a research preview but worth watching.