HeadlinesBriefing favicon HeadlinesBriefing.com

AMD Instinct MI355X GPUs Break 1M Tokens/Sec in MLPerf 6.0

TechPowerUp News •
×

AMD has shattered the 1 million tokens per second barrier in MLPerf Inference 6.0 with its Instinct MI355X GPUs, marking a significant milestone in AI inference performance. The company achieved this breakthrough running Llama 2 70B and GPT-OSS-120B at multinode scale, demonstrating production-ready throughput for large language model deployments.

The Instinct MI355X, built on 3 nm CDNA 4 architecture with 185 billion transistors, supports FP4 and FP6 precision and offers up to 288 GB of HBM3E memory. This hardware foundation, combined with ROCm software optimizations, delivered 3.1x better performance than the previous generation MI325X on Llama 2 70B Server workloads. The GPU also showed competitive single-node performance against NVIDIA's B200 and B300 across multiple benchmarks.

Beyond raw speed, AMD expanded its MLPerf coverage into new territory with first-time submissions on text-to-video generation using Wan-2.2-t2v, achieving 93% of NVIDIA B200 performance on day one. The company also demonstrated efficient multinode scaling, maintaining 93% efficiency when expanding from one to 11 nodes with 87 GPUs delivering over 1 million tokens per second in production scenarios.