Hacker News

willmarch 3 days ago [ - ]

I’m pretty sure the initial weights are randomized meaning no two models will train in the same way twice. The order in which you feed in training data to the model would also add an element of randomness. Model training is closer to growing a plant than running a compiler.

timschmidt 3 days ago [ - ]

That's still a deterministic algorithm. The random data and the order of feeding training data into it are part of the data which determines the output. Again, if you do it twice the same way, you'll get the same output.

willmarch 3 days ago [ - ]

If they saved the initial randomized model and released it and there was no random bit flipping during copying, then possibly but it would still be difficult when you factor in the RLHF that comes about through random humans interacting with the model to tweak its workings. If you preserved that data as well, and got all of the initial training correct... maybe. But I'd bet against it.

timschmidt 3 days ago [ - ]

So long as the data provided was identical, and sources of error like floating point errors due to hardware implementation details are accounted for, I see no reason output wouldn't be identical.

Where would other non-determinism come from?

I'm open to there being another source. I'd just like to know what it would be. I haven't found one yet.

reedciccio 3 days ago [ - ]

> if you do it twice the same way, you'll get the same output

Point at the science that says that, please: Current scientific knowledge doesn't agree with you.

timschmidt 3 days ago [ - ]

> Current scientific knowledge doesn't agree with you.

I'd love a citation. So far you haven't even suggested a possible source for this non-determinism you claim exists.