That's at training time, not inference time. And temp/top_p aren't used to escape local minima, methods like SDG batch sampling, Adam, dropout, LR decay, and other techniques do that.
That's at training time, not inference time. And temp/top_p aren't used to escape local minima, methods like SDG batch sampling, Adam, dropout, LR decay, and other techniques do that.
Ahh okay, so you really can't escape the indeterminacy?
You can zero out temperature and get determinism at inference time. Which is separate from training time where you need forms of randomness to learn.
The point is for the quote "all LLMs that I know of rely on entropy and randomness to emulate human creativity" is a runtime parameter you can tweak down to zero, not a fundamental property of the technology.
Right, but my point is is that even if you turn the temperature all the way down, you're not guaranteed to get an accurate or truthful result even though you may get a mostly repeatable deterministic result, and there is still some indeterminacy.