"Lots of variance in the score can come from random stuff like even Anthropic's servers being overloaded"

Aha, so the models do degrade under load.