DeepSeek Cuts API Cache Prices 90% Across All Models

Stack of glowing translucent data cubes in cool blue tones with a downward-pointing red arrow representing reduced API caching costs

DeepSeek announced Monday that input cache hit prices across its API are now one-tenth of their original rates, with the change applying to the entire V4 model lineup. Its 75% promotional discount on flagship V4-Pro also remains active through May 5.

What the math actually does

On V4-Flash, cache hits now run $0.028 per million input tokens, against $0.14 on cache miss. On V4-Pro, it's $0.145 cached versus $1.74 uncached. That's roughly a 90% discount on anything the model has already seen, applied automatically when a prompt prefix matches a cached version. No flags. No special endpoints.

For anyone building agents or RAG pipelines, this is the part of the pricing docs that matters. A 20,000-token system prompt on an agent that handles 100 queries a day stops being a line item you watch. Long context documents, tool schemas, few-shot examples, chat history: all of it gets billed at near-zero rates after the first call.

The competitive math

Even at full price, V4-Pro already undercut GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro on per-token cost, per TNW's analysis. Both the promotional discount and the cache cut compound on top of that. The Flash variant at $0.14 input and $0.28 output already sits below GPT-5.4 Nano and Claude Haiku 4.5.

That comparison carries caveats DeepSeek itself acknowledges. The company says V4 trails GPT-5.4 and Gemini 3.1 Pro by three to six months on capability benchmarks. Artificial Analysis ranks V4-Pro at #2 on its open weights reasoning index, behind Kimi K2.6. If your workload depends on frontier-grade reasoning or million-token retrieval accuracy, the savings won't close the quality gap. If it depends on running a lot of structured prompts cheaply, the math is hard to argue with.

Why now

DeepSeek's cut arrives three days after the V4-Pro and V4-Flash preview release on April 24, the company's first major model drop since December. The timing matters: a US crackdown targeting Chinese AI model distillation was announced the day before V4 launched. Cutting prices is a competitive move and a political one.

V4 runs on Huawei Ascend hardware rather than Nvidia, and V4-Pro reportedly requires roughly 27% of V3.2's compute for a 1M-token context window. Whichever number you trust, that's presumably what makes the price cut economically viable rather than purely promotional.

V4-Pro's 75% discount expires May 5 at 15:59 UTC. The cache pricing change is permanent, at least until DeepSeek decides otherwise.

Tags:DeepSeekDeepSeek V4V4-ProAPI pricingprompt cachingLLM APIAI agentsRAGcache pricing

Oliver Senti

Senior AI Editor

Former software engineer turned tech writer, Oliver has spent the last five years tracking the AI landscape. He brings a practitioner's eye to the hype cycles and genuine innovations defining the field, helping readers separate signal from noise.

DeepSeek Cuts Input Cache Prices 90% Across All API Models

What the math actually does

The competitive math

Why now

Oliver Senti

Related Articles

DeepSeek Launches V4 Preview With 1M Context Window

OpenAI Launches GPT-5.5 on ChatGPT and Codex

TIME Names 10 Most Influential AI Companies of 2026

Stay Ahead of the AI Curve