HeadlinesBriefing favicon HeadlinesBriefing.com

GPU Cluster Revolutionizes Autoresearch: 16 GPUs Cut Training Time by 9x

Hacker News •
×

Andrej Karpathy's Autoresearch project, where an AI agent autonomously improves neural network training scripts, achieved a dramatic breakthrough when scaled across a 16 GPU cluster. Running for eight hours, the agent submitted approximately 910 experiments. Crucially, it discovered that model width scaling was more impactful than individual hyperparameters, driving validation bits per byte (val_bpb) down from 1.003 to 0.974 – a 2.87% improvement over the baseline.

This wasn't just raw speed; parallel execution fundamentally altered the agent's search strategy. Moving beyond sequential hill-climbing, the agent ran factorial grids of 10-13 experiments per wave, uncovering interaction effects between parameters that sequential methods would miss. For instance, testing six model widths simultaneously allowed immediate trend identification and focused refinement.

The agent also mastered heterogeneous hardware, using cheaper H100s for initial screening and promoting winners to H200s for validation. This approach allowed the parallel agent to reach the same best validation loss 9x faster than the simulated sequential baseline (~8 hours vs ~72 hours).