HeadlinesBriefing favicon HeadlinesBriefing.com

Stochastic rounding beats bias in BF16 training

Hacker News •
×

Round‑to‑nearest repeats the same bias each step, while stochastic rounding injects a zero‑mean error that can cancel. Adding 0.001 to 1.0 a thousand times in BF16 illustrates the issue: rounding never moves past 1.0, but stochastic rounding eventually reaches 2.0 because each update rounds up proportional to its distance in the interval. Over n steps, biased errors grow O(n) versus O(√n) for unbiased variance.

Researchers trained a small MLP on a teacher‑student regression task with HeavyBall’s AdamW, storing all weights in BF16. They swapped optimizer’s rounding mode between stochastic rounding (SR) and round‑to‑nearest (RNE). SR kept the state at six bytes—parameters plus first‑ and second‑moment tensors—and matched the loss of an FP32 state with ten bytes. RNE used six bytes but stalled above FP baseline, showing bias blocks convergence.

These results prove that stochastic rounding eliminates the bias that otherwise caps BF16 training at half the memory of FP32. By preserving numerical fidelity without extra storage, SR lets developers deploy larger models on bandwidth‑constrained hardware while keeping accuracy comparable to full precision. The experiment confirms that removing bias, not adding memory, drives performance gains.