I initially thought the same, but apparently with the inaccuracies inherent to floating-point arithmetic and various other such accuracy leakage, it’s not true!

https://arxiv.org/html/2408.04667v5

This has nothing to do with FP inaccuracies, and your link does confirm that:

“Although the use of multiple GPUs introduces some randomness (Nvidia, 2024), it can be eliminated by setting random seeds, so that AI models are deterministic given the same input. […] In order to support this line of reasoning, we ran Llama3-8b on our local GPUs without any optimizations, yielding deterministic results. This indicates that the models and GPUs themselves are not the only source of non-determinism.”

I believe you've misread - the Nvidia article and your quote support my point. Only by disabling the fp optimizations, are the authors are able to stop the inaccuracies.

First, the “optimizations” are not IEEE 754 compliant. So nondeterminism with floating-point operations is not an inherent property of using floating-point arithmetics, it’s a consequence of disregarding the standard by deliberately opting in to such nondeterminism.

Secondly, as I quoted the paper is explicitly making the point that there is a source of nondeterminism outside of the models and GPUs, hence ensuring that the floating-point arithmetics are deterministic doesn’t help.