Hacker News

The probability distribution that the model outputs is deterministic. The decoding method that uses that distribution to decide what next token to emit may or may not be deterministic. If we decide to define the decoding method as part of "the model", then I guess the model is probabilistic.

It's also worth noting that the parameters (weights and biases) of the model are random variables, technically speaking, and this can be considered probabilistic in nature. The parameter estimates themselves are not random variables, to state the obvious. The estimates are simply numbers.