Tech / AI
Anthropic Cut Opus 67%. The Meter Runs 35% Faster.
Anthropic cut Opus 67% in November 2025 and its inference margin expanded from 38% to 70% by May 2026. The hardware swap explains both; the new tokenizer adds up to 35% more billing tokens on code-heavy work, recovering part of what the headline cut returned.

Anthropic cut its Opus API rate 67% in November 2025, from $15 to $5 per million input tokens. By May 2026, its inference margin had expanded from 38% to 70%. The compute cost behind that $5 price is $1.50.
The Chip-Hour Math
SemiAnalysis reported in May 2026 that Anthropic's inference margin reached 70%, up from 38% a year earlier. At $5 per million tokens, 30% compute cost is $1.50, down from $3.10 at the prior margin. The gap is the hardware swap.
Anthropic's contracted TPUv7 Ironwood rate runs roughly $1.60 per TPU-hour, against H100 spot rates near $2.40 per chip-hour. For calibration: an 8-GPU H100 pod at $19.20 per hour generates roughly 10 million tokens per hour on a 70B model, a rate of $1.90 per million. Claude Opus is substantially larger; at near-retail H100 rates, the old $3.10 compute cost fits that throughput profile.
TPUv7 pods connect 9,216 chips on a single fabric, against 72 GPUs per NVL72 cluster. More inference traffic stays on fast interconnect; effective utilization rises. The 33% raw chip cost reduction plus batch-efficiency gain explains the 52% fall from $3.10 to $1.50, achieved even after the headline rate dropped 67%.
The Tokenizer Offset
Opus 4.7's new tokenizer produces up to 35% more tokens from identical input, with a 1.0x to 1.35x multiplier running highest on code and structured data. The rate card holds at $5 per million tokens. At the top of that range, a million-token budget processes about 26% less text than it did on the prior model.
The tokenizer is a revenue-side effect. Anthropic earns more billing tokens per equivalent workload while compute scales only with actual token count.
What OpenAI Pays
OpenAI's full-year 2025 inference bill reached $8.4 billion, against $13.1 billion in total revenues and a company-wide adjusted gross margin of 33%.
The H1 picture was starker. Azure billing records reviewed by Ed Zitron showed $5.02 billion in inference spending in the first half of 2025, against $4.3 billion in total revenues for the same period. Inference costs exceeded all company revenue; OpenAI runs on Azure spot rates with no contracted-fleet cost offset.
The November price cut points to a structural advantage that compounds. Anthropic's per-token cost fell on hardware and rose on billing simultaneously. Both levers remain unavailable to a competitor running on Azure spot rates.
Anthropic committed $100 billion to AWS over ten years in April, including nearly 1 GW of Trainium3 capacity coming online by December 2026. If the rate card holds at $5 when that hardware lands, any provider paying spot rates faces a gap that no model improvement closes.