HeadlinesBriefing favicon HeadlinesBriefing.com

AutoKernel: AI-Driven GPU Kernel Optimization for PyTorch Models

Hacker News •
×

GitHub's AutoKernel automates GPU kernel optimization for PyTorch models using an autonomous AI agent. The tool profiles models, identifies performance bottlenecks, and iteratively optimizes Triton kernels with built-in correctness checks. Inspired by Andrej Karpathy's autoresearch framework, it applies the same looped experimentation approach to hardware acceleration.

The system works by first analyzing PyTorch models to rank kernel operations by GPU time consumption. It then extracts top bottlenecks like matrix multiplication or attention operations into standalone Triton kernels. An orchestrator prioritizes optimizations using Amdahl's law, focusing on high-impact kernels first. Each experiment modifies kernel.py, runs benchmarks with 5-stage correctness verification, and either keeps or reverts changes based on performance gains.

With support for 9 core deep learning operations including matmul, flash attention, and fused MLP, AutoKernel achieves 80-95% of cuBLAS performance while maintaining numerical stability. The framework includes pre-configured models like LLaMA 7B and GPT-2, requiring only NVIDIA GPUs and Python 3.10+. All experiments log results to a human-readable TSV file, enabling researchers to track progress across thousands of iterations.

This project demonstrates how autonomous agents can methodically explore optimization spaces in deep learning systems. By combining PyTorch's flexibility with Triton's performance, it offers a practical path to accelerated AI inference without manual kernel tuning expertise.