HeadlinesBriefing favicon HeadlinesBriefing.com

Why Users Experience Longer Wait Times Than Your Metrics Show

Hacker News •
×

Marc Brooker from Amazon Web Services explains why users like Alice experience slower service than metrics indicate. While service dashboards might show 100ms average response times, users report waiting one second. This happens because Alice measures time in actual seconds, not in request counts.

The disconnect stems from what Brooker calls the inspection paradox. Service metrics count each request equally, but users experience time-weighted distributions. Mathematically, users see E[X] + Var(X)/E[X] rather than just the mean. Long requests and outages dominate user perception because they consume disproportionate time.

A concrete example illustrates this gap: with a median time-to-recovery of 30 minutes and 99th percentile of 600 minutes, services report an MTTR just over one hour. Yet users experience roughly six hours of downtime on average. The heavy tail matters enormously for customer experience.

Timeout-and-retry patterns can mask latency issues, but recovery time has no such hiding place. Brooker argues this insight makes tail latency critical to understand, and explains why trimmed measurements discard essential context about user experience.