HeadlinesBriefing favicon HeadlinesBriefing.com

Chess Engine Training Secrets: RL vs Distillation

Hacker News: Front Page •
×

Chess engines have evolved beyond traditional reinforcement learning, with lc0 discovering that distillation from search outperforms RL training. The breakthrough came when researchers realized that even a weak model combined with search acts as an oracle, allowing them to distill superior models without expensive self-play. This technique has become standard practice in modern chess engine development.

Stockfish and other engines now leverage distilled models, though some competitors avoid this approach to maintain originality. The runtime distillation technique takes this further by having engines adapt during play - if the network's evaluation differs from search results, it adjusts future evaluations accordingly. This live adaptation represents a significant efficiency gain over traditional training methods.

Perhaps most surprisingly, SPSA optimization works despite being completely gradient-free. By randomly perturbing weights and measuring win rates, engines can improve by up to 50 Elo - equivalent to a year's development effort. The technique can optimize any parameter in the search algorithm, from depth adjustments to evaluation heuristics. This discovery that arbitrary C++ code can be optimized through win/loss feedback challenges conventional machine learning assumptions about the necessity of gradients for effective training.