Anthropic Drops 1M Context Premium for Claude API

Anthropic announced on March 13, 2026 that the full 1M token context window for Claude Opus 4.6 and Sonnet 4.6 is now generally available, and the pricing change matters more than the context expansion itself. The long-context premium that previously charged 2x on input and 1.5x on output for requests exceeding 200K tokens is gone. A 900K-token request now costs the same per-token rate as a 9K one.

For API users, that means Opus 4.6 stays at $5/$25 per million tokens and Sonnet 4.6 at $3/$15 across the entire window. No beta header required. If you were already sending the old anthropic-beta: long-context-2025-01-01 header, Anthropic just ignores it now. No code changes needed.

The pricing is the actual story

A million tokens of context has been technically available for months in beta. What kept teams from using it was the cost. A 500K-token Opus request that previously ran $5.00 in input tokens alone now costs $2.50. For anyone doing legal document review, large codebase analysis, or multi-document research synthesis, that's a halving of input cost and roughly a third off output. According to the official announcement, full rate limits apply across the entire 1M window at standard account throughput.

And here's where it gets competitive. Google's Gemini 2.5 Pro matches the 1M window but still charges a premium above 200K tokens. OpenAI's GPT-5.4, which launched on March 5, does offer a 1M-plus context window (1.05M, technically), but charges 2x input and 1.5x output once you cross 272K tokens. GPT-4.1 offers 1M at flat pricing ($2/$8 per million tokens), but it is a less capable model than GPT-5.4 or Opus 4.6.

So Claude is now the only model family where the two strongest tiers both offer 1M context at flat pricing. Whether that distinction lasts a month or a quarter is anyone's guess.

What Claude Code users actually get

For Claude Code subscribers on Max, Team, and Enterprise plans, Opus 4.6 sessions now default to the full 1M context window automatically. Previously this required opting into extra usage. The Claude Code changelog confirms the change, along with an environment variable (CLAUDE_CODE_DISABLE_1M_CONTEXT=1) if you want to turn it off.

The practical impact here is fewer compactions. Anyone who's used Claude Code for extended debugging sessions knows the pattern: you burn through 100K+ tokens searching logs, source code, and database outputs, then compaction kicks in and you lose the thread. Anton Biryukov, a software engineer at Ramp, put it bluntly in Anthropic's announcement: the model searches, re-searches, aggregates edge cases, and proposes fixes all in one window now. Whether that holds up as reliably as the testimonial suggests, I can't say from an announcement alone.

Jon Bell, CPO at Hex, claimed a 15% decrease in compaction events. That's a useful number, though it raises an obvious question: what about the other 85%? Are those sessions that still hit compaction at 1M, or sessions that never needed it in the first place?

Do the benchmarks hold up?

Anthropic cites two numbers. Opus 4.6 scores 78.3% on MRCR v2 and Sonnet 4.6 scores 68.4% on GraphWalks BFS, both at 1M tokens, and both claimed as highest among frontier models at that context length. These are recall benchmarks, essentially testing whether the model can find specific details buried deep in context.

Recall is not the same as synthesis. The HELMET benchmark from Princeton NLP has shown that most models degrade past 32K tokens on summarization tasks, and Anthropic's announcement doesn't address that distinction. Finding a needle is one thing. Understanding how that needle relates to everything else in the haystack is another, and I haven't seen independent evaluations at 500K+ that test for it convincingly.

The 78.3% MRCR score is relevant for agentic workflows where models reference tool calls and reasoning from much earlier in a conversation. But how often does a real production agent actually need to reason across the full million? Most agentic sessions I've seen top out well under 200K before the task completes or fails for unrelated reasons.

The media limit bump nobody's talking about

Buried in the announcement: up to 600 images or PDF pages per request, up from 100. That's a 6x increase. For teams processing large document sets (think: legal filings, medical records, architectural plans), this is a quiet but substantial change. Available now on the Claude Platform natively, Microsoft Azure Foundry, and Google Cloud's Vertex AI. Amazon Bedrock support is listed as "coming soon."

What's still unclear

Latency. Filling a 1M context window takes time, both for the user to send the tokens and for the model to process them. Anthropic's announcement says nothing about improvements to processing speed, and on very long requests the wait can be substantial. If you're building a user-facing product that loads 500K tokens of context before generating a response, your users are going to notice.

The GA status removes the last pricing friction for teams already doing long-context work with Claude. But it doesn't change the fundamental throughput constraints. And for Claude Code specifically, Pro plan users still need to opt in via /extra-usage to access 1M. This is a paid-tier feature, not a universal one.

Anthropic has been shipping at a clip lately: Excel and PowerPoint integrations on March 11, inline visualizations on March 12, and now this on the 13th. Three announcements in three days. The 1M pricing change is the one with the most immediate dollar impact for developers already building on the API.