> You surely aren't implying that the model is sentient or has any "desire" to give an answer, right?

The model is a probabilistic machine that was trained to generate completions and then fine tuned to generate chat style interactions. There is an output, given the prompt and weights, that is most likely under the model. That’s what one could call the model’s “desired” answer if you want to anthropomorphize. When you constrain which tokens can be sampled at a given timestep you by definition diverge from that