HeadlinesBriefing favicon HeadlinesBriefing.com

Draft chapter reveals CPU cycle costs and branch penalties

Hacker News •
×

The draft of Chapter 4 from the forthcoming book *Efficient C++ Programming for Modern 64‑bit CPUs* lands on Hacker News. Authors Sherry Ignatchenko and Dmytro Ivanchykhin outline a “representative” motherboard diagram to discuss signal latency, parasitic capacitance, and the practical limits of intra‑core communication. They argue that distance, not light speed, dominates the few‑hundred‑picosecond access times seen in 2026 silicon. Understanding these constraints guides low‑level optimization.

Typical register‑register adds or bitwise ops complete in a single cycle, while multiplication costs three to six cycles and division up to twenty— a dramatic drop from the hundred‑plus cycles reported a few years ago. The authors note superscalar designs host multiple ALUs and SIMD units, enabling non‑integer average latencies such as 0.75 cycles and RIPC values approaching twelve in ideal workloads. Cache hierarchies affect throughput.

When a branch instruction appears, CPUs speculate and may discard work if the guess fails, costing 15‑25 cycles per branch misprediction. The draft explains that [[likely]]/[[unlikely]] attributes influence static prediction but offer limited benefit once dynamic branch history accumulates. Authors advise reserving these hints for truly rare error paths rather than everyday logic. Mis‑prediction penalties dominate many workloads.