HeadlinesBriefing favicon HeadlinesBriefing.com

Load‑balanced latency shrinks with more servers

Hacker News •
×

An M/M/c system with c identical servers, each handling one request at a time and no internal queue, sits behind a load balancer that offers an infinite queue. Clients collectively submit c × 0.8 requests per second, keeping per‑server load constant in practice. The question posed: does client‑observed mean latency shrink toward one second, stay flat, improve linearly, or worsen as c grows?

Using Erlang’s C formula E2,n(A) to compute the enqueue probability reveals the curve. At half the saturation point (2.5 rps) the five‑server system queues only 13 % of traffic; doubling servers to ten raises offered load to 5 rps and drops the queue probability to 3.6 %. Consequently, 96.4 % of requests incur no extra wait, driving mean latency asymptotically toward one second significantly.

Monte‑Carlo simulations confirm that median (p50) latency follows the mean curve, and 99th‑plus percentiles share the same downward shape, indicating no hidden tail problems. The result means cloud operators can add modest numbers of servers to achieve lower latency at unchanged per‑server throughput, improving utilization without sacrificing response time. The overall analysis holds as long as arrival rate stays below total service capacity.