HeadlinesBriefing favicon HeadlinesBriefing.com

NanoGPT Slowrun Challenges Data Scaling Limits

Hacker News •
×

Q Labs has launched NanoGPT Slowrun, an open repository for data-efficient learning algorithms that challenges the current scaling paradigm. The project addresses a fundamental bottleneck: while compute grows exponentially, data availability remains limited, creating an imbalance that constrains model intelligence. This constraint affects robotics and biology, where massive compute cannot compensate for data scarcity.

NanoGPT Slowrun operates on a simple premise: train on 100M tokens from FineWeb using unlimited compute, with lowest validation loss determining the winner. Unlike speedrun benchmarks that optimize wall-clock time, Slowrun encourages expensive but potentially powerful approaches like heavy regularization, second-order optimizers, and gradient descent alternatives. The baseline achieves 2.4x data efficiency compared to modded-nanogpt.

Community contributions have rapidly advanced the project, pushing data efficiency from 2.4x to 5.5x in days through techniques like epoch shuffling, learned projections, SwiGLU activations, and model ensembling. The team believes 10x efficiency is achievable soon, with 100x possible by year's end. Q Labs identifies several promising directions: second-order optimizers, diffusion models, curriculum learning, and compression-focused approaches. The project represents a fundamental shift toward algorithms that can learn effectively with limited data but abundant compute.