Hacker News

> Alice says your service is slow. You tell Alice that the mean request to your service completes in 100ms, but Alice says that her mean wait time is 1s.

There are also plenty of situations where a service can have a bimodal performance distribution and the impact of that can fall on certain users disproportionately.

Imagine a retail website that serves images from a global CDN, with cache misses pulled from a server in the EU. Users who visit our homepage, or look at our bestselling products, get a cache hit from the CDN node close to them, in 50ms. But users who look at our long-tail products get a cache miss - and if they're not near Europe, they'll get a noticeable delay.

Hence our mean image load time is 100ms - but a customer browsing an obscure product category for their location can experience markedly worse performance. If Alice is the only person in Costa Rica looking at ski equipment in June, she's going to get a lot of cache misses.

esperent 5 hours ago [ - ]

So what's the solution here? Keep the cache artificially warm for even obscure routes?

steveBK123 4 hours ago [ - ]

There may not be a solution in every case, but it's a reminder that dashboards & metrics are no replacement for actually talking to your users. Metrics are at best a proxy for user experience, don't let them be the tail that wags the dog.

Like the story Bezos told of his execs claiming call wait times were under 1 minute, so he called the service line from the conference room on the spot and made everyone sit there for 10 minutes waiting to get thru..