HeadlinesBriefing favicon HeadlinesBriefing.com

TorchTPU: Native PyTorch on TPUs

Hacker News •
×

Google has introduced TorchTPU, enabling PyTorch to run natively on TPUs at massive scale. The custom solution addresses the growing demand for distributed AI systems, allowing developers to leverage Google's Tensor Processing Units without extensive code changes. TorchTPU prioritizes usability while extracting maximum performance from TPUs that power Google's Gemini and Veo platforms.

The engineering behind TorchTPU includes three distinct eager modes. The most significant breakthrough is Fused Eager mode, which automates operation fusion to deliver 50% to 100+% performance gains over Strict Eager. Developers can debug with Debug Eager, maintain familiar single-op dispatch with Strict Eager, or maximize performance with Fused Eager - all backed by a shared compilation cache.

For peak performance, TorchTPU integrates with torch.compile using XLA as the backend compiler. The solution supports distributed training APIs like DDP and FSDPv2 while handling MPMD execution patterns. By mapping PyTorch operators directly to StableHLO, TorchTPU enables custom kernels written in Pallas and JAX, providing hardware awareness to optimize models specifically for TPU tensor cores.