Floating point math isn't associative for operations that are associative in normal math.

That would just add up to statistical noise instead of 10% degradation over a week.

Catastrophic error accumulation can produce more profound effects than noise.

Just to make sure I got this right. They serve millions of requests a day & somehow catastrophic error accumulation is what is causing the 10% degradation & no one at Anthropic is noticing it. Is that the theory?

FYI something in that region happened last august/September. Some inference bug triggered worse performance on TPUs vs GPU.