> More technically, what’s going on here is the inspection paradox. Alex and Alice don’t experience your latency distribution , they experience a t-weighted version of it

Ooh I got pushed in the 2m end of the pool there. What is the intuition? The ten hundred most popular words sort of thing.

I am very interested in this article though. At first I assumed it would be about TTFB vs. time to render the page after all those async useEffects have run, but it isn't that this is something else and I am very interested.

Perhaps an easier to intuit version of it is how full airliners are.

An airline might report that their flights are on average 60% full, and that might be completely absolutely 100% true. But that's not what passengers experience. If we assume (for convenience) that a plane holds 100 people, when the plane is 20% full then 20 passengers experience that, but when the plane is 100% full then 100 passengers experience that. On average, from a passenger's point of view, the flights are much more than 60% full--it might be 70 or 80%--because a full flight is experienced by more passengers than an empty flight.

For a concrete example imagine two flights, one 20% full and one 100% full: the average is 60% from the airline's point of view, but 100 passengers experienced a full flight and only 20 experienced the 20% full flight, so from the passenger's point of view the average is 86.7% full.

The same logic applies to outages. If you have an outage that lasts one minute then only a few users will encounter it. If you have an outage that lasts one hour then many more users will encounter that. The longer the outage is, the more likely any given user is to encounter it, so from the user's point of view the "average" outage is much longer than the "true" average where you weight every outage equally.

Again we can consider a concrete example: imagine you run a website that gets 100 visitors per minute. You have one outage that lasts 1 minute, then later a second outage that lasts 9 minutes. Your average outage time is 5 minutes. But 100 visitors experienced the 1 minute outage, while 900 visitors experienced the 9 minute outage, so from the point of view of a visitor the average outage is (900*9 + 100*1)/1000 = 8.2 minutes.

I think you got the `900*9` wrong if you talk about experienced downtime. If you calculate discrete minutes

(900*4,5+100*1)/1000 = 4,15 min

(Unless you manage to inform the user since how long the website has been down already.)

This could be made more accurate if we calculate it over seconds, which would drive the experienced downtime even lower!

It's 100*10*4.5 inside, you're summing 100*9+100*8+...+100*1.

So 4.6min, i.e. 4:36

So clear! Thanks!!

When you measure latency, you’re measuring it based on requests. So in some bucket if you had a request take 2s and a request take 10s, you would say the average is 6s. This answers the question “how long should I expect a single request to take”.

But the articles point is that to the people - it’s not the number of requests that matters - it’s how often they are waiting for them. The question is “how much time am I sitting here waiting?” In that case the 10s request is 5x worse than a 2s request - it takes 5x of the “time spent”.

So you can change the weighing to 1 / 2 2s 1 / 2 10s to 1 * 2/12 & 1 * 10/12 - that gives us 17% and 83%. And the average there is 9.04s.

The difference is the question. If you think about it as road segments, let’s say you have a group of road segments with different lengths. You can ask the “average length of segment” - or you can ask “if you pick a random point among all the segments, how long is the segment that I landed in?” You’re picking very differently there - the second is proportional to the length!

Technically IMO the blog is slightly off - you want to use “mean residual life”. Ie if I pick a random TIME how long do I have to wait for my request to finish. But it’s reasonably close.

Similarly curious about this. The intuition I extracted:

Let’s say we have 10 requests, where 9 of them take 1 second to complete but one that takes 100 seconds. The average time to complete a request is about 10 seconds, but if you experience the requests in series, at any given time you’re much more likely to sit and wait in one of those 100 second requests.

So if you imagine a long series of requests from this distribution and place yourself randomly in the series, the average time to completion is just a bit less than 50 seconds.

This is what is meant by t-weighted, that events with a large t take a larger place.

I see it is about a long series of requests. Makes sense. Ill start looking at latency at p99.9 and p99.99 more often now!

When we measure the average experience, it's crucial what we are sampling/measuring uniformly to construct that experience.

The service provider is choosing to weight all requests uniformly, and average over requests -- some have 10s latency and some have 1s latency.

The user lives in time, and chooses to weight their time intervals equally. So a 10 second pause carries 10 times more weight for them than a 1 second pause -- because they experience it 10 times as much! So their average experience is a different weighted average.

The conceptual point is that averaging always needs a measure, and implicitly assumes one if you aren't explicit about the choice.

AIUI: my intuition is Alex and Alice are points in the distribution. They don’t think about their experience in terms of population statistics. They see their individual latency times, and use that as their sample. If t is low in their experience, great the distribution is low.

But for any t that goes high that they observe (which tends to be the case in a skewed distribution such as service latencies), it drags their impression of the distribution up, dominating the shape of that impression.

Arithmetic mean is just really bad for latency conversations (ditto MTTR). Other averages have their place but for a legible, accessible chart that's 4 lines in anything: p50, p75, p95, p99.9 with the last having the SLA is IMHO the right thing to goal and alert on in a cross-functional setting that's attaching engineering outcomes to business outcomes.

There's better math for advanced introspection, but for stuff everyone in the room can intuit no matter their discipline, that's a really sweet spot.

And it's motivating: the p99.9 latency is a bunch of quick, high-impact wins if you haven't profiles it yet. A good time is had by all.