HeadlinesBriefing favicon HeadlinesBriefing.com

AI Agents Waste Millions Recomputing Identical KV Caches

Hacker News •
×

Every AI agent reading the same document repeats the same expensive computation, rebuilding identical key-value caches from scratch. This prefill step consumes the most compute during inference, yet produces the same result millions of times over. Researchers propose a simple fix: compute once, sell access.

The approach lets publishers precompute document KV caches and license them to other agents. Testing on Qwen3-4B shows reuse costs 9-50x less than prefill, with savings increasing for longer texts. Results match exactly - all 24 greedy tokens align, with identical logits. A single reuse already pays back the initial investment.

Shipping precomputed caches fails because KV data is nearly incompressible, making egress costs exceed prefill savings. Hosting provider-side eliminates this issue, mirroring production prompt-caching systems. Serving one popular 3774-token document to 80 million agents costs $1.5M with repeated prefill but only $30,000 through cache reuse - a 49.7x reduction.

Current cache-read tariffs already provide 10x discounts while operating within this efficiency envelope. The research frames an agent-native prefill CDN and identifies lossless KV compression plus cross-party payment systems as critical unsolved challenges. Millions of dollars in compute savings per popular document make this infrastructure opportunity impossible to ignore.