> GPUs put the associativity of the sums in matrix multiplications in arbitrary order
That’s user-controlled too, not an inherent property of GPUs:
https://docs.pytorch.org/docs/2.12/generated/torch.use_deter...
> GPUs put the associativity of the sums in matrix multiplications in arbitrary order
That’s user-controlled too, not an inherent property of GPUs:
https://docs.pytorch.org/docs/2.12/generated/torch.use_deter...
The matrix multiplication is only deterministic for sparse-dense products under these settings:
> torch.bmm() when called on sparse-dense CUDA tensors
And it's not listed under the operations that raise an exception otherwise, so I'm not sure the docs promise that dense-dense matrix-matrix products are deterministic.
Oh, thanks, that’s interesting, I thought it covered that too!