OpenAI Releases GPT-5.2 With 400K Context Window and Perfect AIME Score

Computer monitor displaying AI benchmark performance charts in a modern workspace

OpenAI released GPT-5.2 on Thursday, December 11, 2025, less than a month after launching GPT-5.1. The new model arrives amid internal pressure following Google's Gemini 3 launch last month, which temporarily knocked OpenAI from benchmark leaderboards.

GPT-5.2 ships in three tiers: Instant for quick tasks, Thinking for complex reasoning and coding, and Pro for maximum accuracy on difficult problems. All three are rolling out to paid ChatGPT subscribers (Plus, Pro, Business, Enterprise) starting today.

Benchmark Results

The numbers show meaningful gains over GPT-5.1 in several key areas.

On SWE-Bench Pro, which tests real-world software engineering across four programming languages, GPT-5.2 Thinking scores 55.6%, up from 50.8% on GPT-5.1. OpenAI calls this benchmark more contamination-resistant than the older SWE-Bench Verified, where Anthropic's Claude Opus 4.5 still leads.

Graduate-level science reasoning improved too. GPT-5.2 Thinking hits 92.4% on GPQA Diamond, compared to 88.1% for GPT-5.1. The Pro tier pushes this to 93.2%.

The most striking result: a perfect 100% on AIME 2025, a competitive mathematics evaluation. This matches what Gemini 3 Pro achieves only with code execution enabled.

On ARC-AGI-2, a benchmark designed to test abstract reasoning on novel problems, GPT-5.2 Thinking scores 52.9%. GPT-5.2 Pro reaches 54.2%, both ahead of Gemini 3 Deep Think (45.1%) and Claude Opus 4.5 (37.6%).

Pricing and Availability

API access costs $1.75 per million input tokens and $14 per million output tokens for GPT-5.2 Thinking. That's 40% higher than GPT-5.1's $1.25/$10 pricing.

The new pricing positions GPT-5.2 between competitors: cheaper than GPT-5.1 was on input when compared to Gemini 3 Pro ($2.00 per million input), but more expensive on output (Gemini 3 charges $12 per million output tokens under 200K context).

GPT-5.2 Pro runs $21 input and $168 output per million tokens.

The model supports a 400,000-token context window and can generate up to 128,000 tokens in a single response. Knowledge cutoff is August 31, 2025.

What Changes for Users

OpenAI says GPT-5.2 Thinking produces 38% fewer errors than its predecessor. The company highlighted improvements in spreadsheet creation, presentations, code generation, and image understanding.

For developers, the models are available immediately via the API as gpt-5.2 (Thinking) and gpt-5.2-chat-latest (Instant).

GPT-5.1 stays available in ChatGPT for three months under legacy models before OpenAI sunsets it. The company says it has no current plans to deprecate GPT-5.1, GPT-5, or GPT-4.1 in the API.

Sam Altman told CNBC he expects OpenAI to exit "code red" status by January.

Tags:GPT-5.2OpenAIChatGPTAI benchmarkslarge language modelsAPI pricingSWE-Benchcode redGemini 3

Oliver Senti

Senior AI Editor

Former software engineer turned tech writer, Oliver has spent the last five years tracking the AI landscape. He brings a practitioner's eye to the hype cycles and genuine innovations defining the field, helping readers separate signal from noise.

OpenAI Releases GPT-5.2 With 400K Context Window and Perfect AIME Score

Benchmark Results

Pricing and Availability

What Changes for Users

Oliver Senti

Related Articles

Microsoft Swaps OpenAI and Anthropic for MAI Models in Excel and Outlook

Linux Foundation Launches Akrites to Coordinate Open Source Patching

Anthropic Extends Claude Fable 5 Subscription Access to July 12

Stay Ahead of the AI Curve