workblogaboutcontactresume ↓

§03//BLOG

§ log.005 — DeepSeek cut V4-Pro to $0.036/MTok. That's 75% off.

3 min read520 wordsAIDeepSeekpricingfrontier-models

DeepSeek slashed V4-Pro input pricing by 75% through May 5 and permanently dropped cache-hit pricing to one-tenth across all their APIs. V4-Pro at $0.036/MTok is cheaper per token than V4-Flash.

DeepSeek announced price cuts on April 26 that fundamentally change the math on V4. If you read my last post where I was comparing V4-Pro at $1.74/$3.48 through OpenRouter — the direct API pricing just got a lot more interesting.

what actually changed

Two things. One temporary, one permanent.

Temporary — 75% off V4-Pro input. Through May 5, V4-Pro input tokens drop from ~$0.145/MTok to ~$0.036/MTok. The promotional pricing is shorter than a free trial at Planet Fitness, so if you want to test it, do it now.

Permanent — cache-hit pricing cut 10x. Across every DeepSeek API. If your workload has a lot of repeated or near-identical queries — common in production agent deployments — this is the bigger story. The per-token cost on cached inputs is now essentially negligible.

V4-Flash pricing didn't change. It stays at $0.14/$0.28.

what this means in practice

The numbers side by side:

ModelInput /MTokOutput /MTok
V4-Pro (direct, promotional)$0.036$1.10
V4-Pro (direct, standard)$0.145$1.10
V4-Pro (OpenRouter)$1.74$3.48
V4-Flash (anywhere)$0.14$0.28
V4-Pro is now cheaper on input than V4-Flash. $0.036/MTok for a 1.6T MoE model with 49B active params and 1M context is absurd. For an entire month of running an agent that makes 100M input token calls, that's $3.60.

The OpenRouter markup on Pro was always steep — 12x the direct price on input. This price cut widens that gap even further. If you're routing through OpenRouter, the flash version is still the sensible default for most agentic work, and it's worth setting up the DeepSeek API directly if you want Pro's world-knowledge benchmarks.

cache-hit pricing matters more than people realize

The permanent 10x cut on cache hits is the understated part of this announcement. In practice, many production workloads — especially agents that repeatedly call similar prompts — can achieve 50-70% cache hit rates. With the new pricing, a heavily cached V4-Pro workload could average sub-$0.01/MTok effective cost.

That's getting close to local inference territory without the hardware cost.

the broader picture

Chinese open-weight models are embedded in ~80% of US AI startups now. Chinese models cost one-sixth to one-fourth of US rivals. And the White House is publicly accusing foreign actors of systematic model distillation.

I'm not going to litigate geopolitics in a blog post. But for a bootstrapped builder like me — I care about results and cost. V4-Pro at $0.036/MTok is a real option for workloads where I was previously looking at much more expensive providers. The quality is there (80%+ SWE-bench, competitive world knowledge). The price is now aggressively below everything else.

WrapsRL runs on image generation, not text-only agent pipelines. But every dollar I save on LLM inference goes toward the compute that actually generates revenue. I'll be setting up direct DeepSeek API access this week to evaluate whether Pro's world-knowledge improvements translate to better decal prompt refinement.

For the price of a coffee, I can run an assistant for a week. Hard to argue with that.

$ echo "frontier model at commodity pricing"

frontier model at commodity pricing

§end of file