LLMs & Foundation Models

DeepSeek Cuts Input Cache Prices 90% Across All API Models

Cache hits now bill at one-tenth the launch rate. V4-Pro's 75% promotional discount runs through May 5.

Oliver Senti
Oliver SentiSenior AI Editor
April 27, 20262 min read
Share:
Stack of glowing translucent data cubes in cool blue tones with a downward-pointing red arrow representing reduced API caching costs

DeepSeek announced Monday that input cache hit prices across its API are now one-tenth of their original rates, with the change applying to the entire V4 model lineup. Its 75% promotional discount on flagship V4-Pro also remains active through May 5.

What the math actually does

On V4-Flash, cache hits now run $0.028 per million input tokens, against $0.14 on cache miss. On V4-Pro, it's $0.145 cached versus $1.74 uncached. That's roughly a 90% discount on anything the model has already seen, applied automatically when a prompt prefix matches a cached version. No flags. No special endpoints.

For anyone building agents or RAG pipelines, this is the part of the pricing docs that matters. A 20,000-token system prompt on an agent that handles 100 queries a day stops being a line item you watch. Long context documents, tool schemas, few-shot examples, chat history: all of it gets billed at near-zero rates after the first call.

The competitive math

Even at full price, V4-Pro already undercut GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro on per-token cost, per TNW's analysis. Both the promotional discount and the cache cut compound on top of that. The Flash variant at $0.14 input and $0.28 output already sits below GPT-5.4 Nano and Claude Haiku 4.5.

That comparison carries caveats DeepSeek itself acknowledges. The company says V4 trails GPT-5.4 and Gemini 3.1 Pro by three to six months on capability benchmarks. Artificial Analysis ranks V4-Pro at #2 on its open weights reasoning index, behind Kimi K2.6. If your workload depends on frontier-grade reasoning or million-token retrieval accuracy, the savings won't close the quality gap. If it depends on running a lot of structured prompts cheaply, the math is hard to argue with.

Why now

DeepSeek's cut arrives three days after the V4-Pro and V4-Flash preview release on April 24, the company's first major model drop since December. The timing matters: a US crackdown targeting Chinese AI model distillation was announced the day before V4 launched. Cutting prices is a competitive move and a political one.

V4 runs on Huawei Ascend hardware rather than Nvidia, and V4-Pro reportedly requires roughly 27% of V3.2's compute for a 1M-token context window. Whichever number you trust, that's presumably what makes the price cut economically viable rather than purely promotional.

V4-Pro's 75% discount expires May 5 at 15:59 UTC. The cache pricing change is permanent, at least until DeepSeek decides otherwise.

Tags:DeepSeekDeepSeek V4V4-ProAPI pricingprompt cachingLLM APIAI agentsRAGcache pricing
Oliver Senti

Oliver Senti

Senior AI Editor

Former software engineer turned tech writer, Oliver has spent the last five years tracking the AI landscape. He brings a practitioner's eye to the hype cycles and genuine innovations defining the field, helping readers separate signal from noise.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

DeepSeek Cuts API Cache Prices 90% Across All Models | aiHola