Can you point to something that says it's not? The only source of non-determinism I've read of affecting LLM training is floating point error which is well understood and worked around easily enough.
Can you point to something that says it's not? The only source of non-determinism I've read of affecting LLM training is floating point error which is well understood and worked around easily enough.
Search more, there is a lot of literature discussing how hard the problem of reproducibility of GenAI/LLMs/Deep Learning is, how far we are from solving it for trivial/small models (let alone for beasts the size of the most powerful ones) and even how pointless the whole exercise is.
If there's a lot, then it should be easy for you to link an example right? One that points toward something other than floating point error.
There simply aren't that many sources of non-determinism in a modern computer.
Though I'll grant that if you've engineered your codebase for speed and not for determinism, error can creep in via floating point error, sloppy ordering of operations, etc. These are not unavoidable implementation details, however. CAD kernels and other scientific software do it every day.
When you boil down what's actually happening during training, it's just a bunch of matrix math. And math is highly repeatable. Size of the matrix has nothing to do with it.
I have little doubt that some implementations aren't deterministic, due to software engineering choices as discussed above. But the algorithms absolutely are. Claiming otherwise seems equivalent to claiming that 2 + 2 can sometimes equal 5.
> I have little doubt that some implementations aren't deterministic
Not some of them; ALL OF THEM. Engineering training pipelines for absolute determinism would be, quite frankly, extremely dumb, so no one does it. When you need millions of dollars worth of compute to train a non-toy model are you going to double or triple your cost just so that the process is deterministic, without actually making the end result perform any better?
Depends on how much you value repeatability in testing, and how much compute you have. It's a choice which has been made often in the history of computer science.
The cost of adaptive precision floats can be negligible depending on application. One example I'm familiar with from geometry processing: https://www.cs.cmu.edu/~quake/robust.html
Integer math often carries no performance penalty compared to floating point.
I guess my takeaway from this conversation is that there's a market for fast high-precision math techniques in the AI field.