DevToolsClaude Codeauto modedeveloper tools

Claude Code Auto Mode 2026 — How the Classifier Decides Permissions for You

Claude Code auto mode uses a separate classifier model to judge risk in real time, auto-approving safe actions and blocking dangerous ones. Here's how it works in practice.

2026년 3월 29일 (일)·5분 소요·

Auto mode for Claude Code - Anthropic Blog

Claude Code auto mode permission classifier architecture — Source: Anthropic

I spent more time pressing "y" than writing code. Every file read, every search, every bash command in Claude Code needed manual approval. A complex refactoring task meant sitting at the keyboard for ten minutes doing nothing but confirming permissions. On March 24, Anthropic shipped the fix: auto mode.

Auto mode is a new permission system where a separate classifier model evaluates the risk of every tool call in real time. Safe actions run automatically. Dangerous ones get blocked. No more choosing between --dangerously-skip-permissions and approval fatigue.

The Permission Problem

Claude Code originally shipped with three permission modes. The default required manual approval for every single tool call, including harmless file reads and code searches. On the opposite end sat --dangerously-skip-permissions, which did exactly what the name implies: it bypassed all safety checks, leaving your system wide open to accidental file deletions, data exfiltration, or malicious code execution.

According to Anthropic's official blog post, many developers eventually caved and switched to --dangerously-skip-permissions out of sheer frustration. The gap between convenience and safety was too wide.

Allowlists existed as a middle ground, but they were static. You had to register individual tools and commands one by one. The fundamental problem: whether a command is "safe" depends on context, not on the command itself. rm is fine when it's deleting a temp file. It's catastrophic when it's targeting your production database.

How the Classifier Works

The core of auto mode is a separate classifier model that intercepts every tool call. When Claude generates a tool invocation, the classifier reviews the full conversation context alongside the requested action before deciding whether to approve or block.

The classifier checks three things. First, whether the action escalates beyond the scope of what you asked for. If you said "fix the tests" and Claude tries to modify your deployment config, that gets blocked. Second, whether the target infrastructure is recognized as trusted. Third, whether the action appears to be driven by hostile content encountered in a file or web page, which is the prompt injection defense layer.

User Request → Claude generates tool call
                    ↓
            [Classifier Model]
                    ↓
        Safe? → Auto-execute
        Risky? → Block & redirect

Here's the interesting part: the classifier runs on Claude Sonnet 4.6, regardless of which model your main session uses. Even if you're running Opus 4.6, the classifier is always Sonnet. That's a deliberate speed-cost tradeoff. Since the classifier fires on every single tool call, it needs to be fast and cheap.

What Gets Through, What Gets Blocked

The boundary between auto-approved and blocked actions is contextual, but patterns emerge.

Read-only operations almost always pass: file reads, code searches, directory listings. File writes pass too, as long as they fall within the scope of what you explicitly asked for. If you said "update the styles on this component," modifying that component's CSS is a safe action.

Mass file deletions, destructive bash commands like rm -rf, external network requests, access to environment variables or credentials, git push --force, and deployment commands all get flagged. The classifier redirects Claude to take a different approach rather than simply blocking and stopping.

The shift from the old allowlist model is significant. Instead of claude --allowed-tools "Bash(git diff)" "Bash(git log)", the classifier reasons about intent. It's not "is git diff safe?" but "is running git diff right now consistent with what the user asked for?"

What Changes in Practice

Turning on auto mode transforms the daily workflow. The most immediate change: dead time disappears. You can kick off a complex refactoring or test generation task and walk away. No more returning to the terminal every 30 seconds to press "y."

Enabling it is straightforward. Run claude --enable-auto-mode from the terminal, or switch via /permissions inside a Claude Code session.

claude --enable-auto-mode

Anthropic's documentation recommends running auto mode in isolated environments, meaning containers or VMs. If the classifier makes a wrong call, the blast radius stays contained. Anthropic is openly acknowledging that the classifier isn't perfect.

Token usage, cost, and latency increase slightly because the classifier runs on every tool call. In practice, that overhead is negligible compared to the time you'd spend waiting to approve each action manually.

Where It Fits in the Claude Code Ecosystem

Auto mode arrived as part of Claude Code's massive March 2026 update wave. In the same period, hooks launched for running custom scripts before and after tool calls, and agent capabilities got a major upgrade enabling parallel task execution.

The combination is powerful. Hooks automate formatting and linting. Auto mode removes approval friction. Agents handle parallel workstreams. You set the direction, Claude Code handles execution.

Auto mode is currently available as a research preview on the Team plan. Enterprise and API access is rolling out soon. It requires Claude Sonnet 4.6 or Opus 4.6 — no support for Haiku, Claude 3 models, or third-party providers like Bedrock or Vertex.

Open Questions

Auto mode doesn't solve everything. The classifier's decision criteria aren't public, which means developers learn the boundaries through trial and error. The classifier itself runs on Sonnet 4.6, raising questions about whether sophisticated prompt injection attacks could fool it. And the recommendation to use isolated environments suggests Anthropic knows the safety net has holes.

Still, auto mode breaks the false binary between convenience and security. It's not perfect, but it carves out a reasonable middle ground between pressing "y" a hundred times and throwing all caution away.

The best permission system is one you never think about. Auto mode is the first real step in that direction.

Get daily AI news in your inbox. Subscribe to spoonai.me newsletter

출처

← 홈으로 돌아가기