What makes models non-deterministic isn't the training algorithm, but the initial weights being random.

Training is reproducible only if, besides the pipeline and data, you also start from the same random weights.

That would fall under "Feed the same data in and you'll get the same weights out." Lots of deterministic algorithms use a random seed.

So is there no “introduce randomness” at some step afterwards? If not, I would guess these models would be getting stuck in a local maxima

> If not, I would guess these models would be getting stuck in a local maxima

It sounds like you're referring to something like simulated annealing. Using that as an example, the fundamental requirement is to introduce arbitrary, uncorrelated steps -- there's no requirement that the steps be random, and the only potential advantage of using a random source is that it provides independence (lack of correlation) inherently; but in exchange, it makes testing and reproduction much harder. Basically every use of simulated annealing or similar I've run into uses pseudorandom numbers for this reason.