HeadlinesBriefing favicon HeadlinesBriefing.com

Penguin Solutions Launches First CXL-Based KV Cache Server

TechPowerUp News •
×

Penguin Solutions has unveiled the MemoryAI KV cache server, the first production-ready solution leveraging CXL memory technology to tackle the memory wall in AI inferencing. The server delivers up to 11 TB of CXL-based memory by combining 3 TB of DDR5 main memory with up to eight 1 TB CXL Add-in Cards, specifically engineered for enterprise-scale inference workloads including agentic AI.

Unlike episodic model training, continuous inference demands are 70% memory-driven and 30% compute-driven, creating bottlenecks and GPU idle time. The MemoryAI server addresses this by expanding memory capacity and enabling faster time-to-first-token while reducing redundant re-compute operations. This allows organizations to process larger datasets and train bigger models more efficiently.

The solution offers multiple operational benefits including support for larger context sizes and concurrency, flexibility to tier cluster memory at speeds 10x faster than NVMe-based approaches, and compatibility with NVIDIA Dynamo software. It also provides cost and power efficiency by optimizing GPU usage and drawing less power than equivalent GPU servers. Penguin Solutions' innovation builds on its high-performance computing expertise, with customers already deploying the technology to meet demanding latency SLAs for production AI workloads.