Is that really going to matter in FP32, FP16 or BF16? I would think models would be written so they'd be at least somewhat numerically stable.

Also if the inference provider guarantees specific hardware this shouldn't happen.

Wait, wouldn't it be more significant in low bit numbers, which is the whole reason they're avoided in maths applications? In any modelling work I've ever done, low bit numbers were entirely the reason exact order was important, where float64 makes it mostly negligible.