So why don’t we all use LLMs with temperature 0? If we separate models (incl. parameters) into two classes, c1: temp=0, c2: temp>0, why is c2 so widely used vs c1? The nondeterminism must be viewed as a feature more than an anti-feature, making your point about temperature irrelevant (and pedantic) in practice.
And then the 3rd time it shows up differently leaving you puzzled on why that happened.
The deterministic has a lot of 'terms and conditions' apply depending on how it's executing on the underlying hardware.
So why don’t we all use LLMs with temperature 0? If we separate models (incl. parameters) into two classes, c1: temp=0, c2: temp>0, why is c2 so widely used vs c1? The nondeterminism must be viewed as a feature more than an anti-feature, making your point about temperature irrelevant (and pedantic) in practice.