DeepSeek announced Monday that input cache hit prices across its API are now one-tenth of their original rates, with the change applying to the entire V4 model lineup. Its 75% promotional discount on flagship V4-Pro also remains active through May 5.
What the math actually does
On V4-Flash, cache hits now run $0.028 per million input tokens, against $0.14 on cache miss. On V4-Pro, it's $0.145 cached versus $1.74 uncached. That's roughly a 90% discount on anything the model has already seen, applied automatically when a prompt prefix matches a cached version. No flags. No special endpoints.
For anyone building agents or RAG pipelines, this is the part of the pricing docs that matters. A 20,000-token system prompt on an agent that handles 100 queries a day stops being a line item you watch. Long context documents, tool schemas, few-shot examples, chat history: all of it gets billed at near-zero rates after the first call.
The competitive math
Even at full price, V4-Pro already undercut GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro on per-token cost, per TNW's analysis. Both the promotional discount and the cache cut compound on top of that. The Flash variant at $0.14 input and $0.28 output already sits below GPT-5.4 Nano and Claude Haiku 4.5.
That comparison carries caveats DeepSeek itself acknowledges. The company says V4 trails GPT-5.4 and Gemini 3.1 Pro by three to six months on capability benchmarks. Artificial Analysis ranks V4-Pro at #2 on its open weights reasoning index, behind Kimi K2.6. If your workload depends on frontier-grade reasoning or million-token retrieval accuracy, the savings won't close the quality gap. If it depends on running a lot of structured prompts cheaply, the math is hard to argue with.
Why now
DeepSeek's cut arrives three days after the V4-Pro and V4-Flash preview release on April 24, the company's first major model drop since December. The timing matters: a US crackdown targeting Chinese AI model distillation was announced the day before V4 launched. Cutting prices is a competitive move and a political one.
V4 runs on Huawei Ascend hardware rather than Nvidia, and V4-Pro reportedly requires roughly 27% of V3.2's compute for a 1M-token context window. Whichever number you trust, that's presumably what makes the price cut economically viable rather than purely promotional.
V4-Pro's 75% discount expires May 5 at 15:59 UTC. The cache pricing change is permanent, at least until DeepSeek decides otherwise.




