284 Billion Parameters, 13 Billion Doing Work

DeepSeek priced V4-Flash at $0.14 per million input tokens when it launched April 24, running 284 billion parameters but activating 13 billion per token.

The design is Mixture-of-Experts. Each token routes through a small subset of specialized modules. Compute cost per forward pass depends on the activated count (13 billion), with the remaining 271 billion sitting idle in memory.

A 58-page technical report released April 26 disclosed a second lever. The KV cache, the working memory holding prior token context, compressed to 10% of V3.2's size. At that ratio, the cache fits on SSDs rather than GPU high-bandwidth memory.

Together the two changes drove single-token FLOPs for long-context inference to 27% of V3.2. The $0.14 reflects that arithmetic.

Two days after launch, DeepSeek cut cache-hit input prices to $0.0028 per million. Cache hits skip the GPU forward pass entirely; the bill becomes memory bandwidth and storage.

Measuring the Spread

Anthropic charges $3.00 per million for Claude Sonnet 4.6. OpenAI's GPT-4o runs $2.50 per million. On input tokens, the spread to V4-Flash's $0.14 is 21-fold and 18-fold, respectively.

V4-Flash output runs $0.28 per million; GPT-4o charges $10.00. At a 1:3 input-to-output ratio, the blended gap widens to roughly 33-fold.

Density explains part of that spread. Claude Sonnet and GPT-4o use dense architectures with larger activated parameter counts per token, which means lower throughput per GPU-hour and a higher marginal rate.

KV cache design explains the remainder for long-context tasks. An H100 holds 80 gigabytes of high-bandwidth memory, and a large dense model at one million tokens of context consumes a meaningful share. At long context, the per-token price is an architecture constraint.

DeepInfra achieved $0.05 per million on Nvidia Blackwell using NVFP4 quantization in February 2026, running large-scale MoE models. If DeepSeek's $0.14 sits above that floor, the gap is margin.

V4-Pro is the denser sibling: 1.6 trillion total parameters, 49 billion active. Its current price of $0.435 per million reflects a 75% promotional discount expiring May 31.

The Lab That Bought Time

OpenAI generated $13.1 billion in revenue in 2025 and spent roughly $22 billion, a net loss of about $9 billion. Its own projections anticipate $74 billion in operating losses by 2028 before reaching profitability in 2030.

The Stargate joint venture launched at $500 billion in January 2025. By May 2026, the $1.4 trillion spending target had been cut to $600 billion. OpenAI transferred its Norway data center lease to Microsoft and canceled a 600-megawatt Abilene, Texas expansion.

The company now describes Stargate as an "umbrella" for its compute strategy and leases capacity from Amazon, Microsoft, and Google Cloud.

Enterprise GPU deployments typically run at 5% to 20% utilization, a ceiling that describes commercial deployments, not hyperscale inference operators. For OpenAI, the binding constraint is depreciation: every dollar of H100 hardware in a dense-architecture fleet amortizes across far fewer FLOPs per token than a MoE router delivers.

For investors reviewing OpenAI ahead of the late-2026 listing window that CFO Sarah Friar has discussed publicly, the $0.14-to-$2.50 input gap demands an architecture explanation. The $0.05 Blackwell floor puts a specific number on that demand: a 50-fold spread between the hardware reference rate and OpenAI's current ask.

V4-Pro's standard rate takes effect June 1 at $1.74 per million input tokens. That narrows the input gap to GPT-4o to $0.76, which is 44 percent of V4-Pro's own rate, not 30 percent of GPT-4o's. Whether DeepSeek extends the promotional discount or lets it expire on schedule will show whether $1.74 covers the cost of 49 billion active parameters.