Hm, why T=-0.0001 instead of T=-1 ?
Also, I wonder, if you sampled a lot of text at temperature -1, and then trained a new model on that text, and then sampled the resulting model at T=-1 , would you get anything meaningful?
Hm, why T=-0.0001 instead of T=-1 ?
Also, I wonder, if you sampled a lot of text at temperature -1, and then trained a new model on that text, and then sampled the resulting model at T=-1 , would you get anything meaningful?
From the article:
"As temperature approaches zero from the negative side, the model output will again be deterministic — but this time, the least likely tokens will be output."
I understand this as, a negative number far from zero is also quite random (just with a distribution that will produce unlikely tokens).
Yep! Very large negative temperatures and very large positive temperatures have essentially the same distribution. This is clearer if you consider thermodynamic beta, where T = ±∞ corresponds to β = 0.