In a way, negative temperature is higher than the highest positive temperature. High positive temperatures just gives you a uniform distribution on all possible tokens, highly negative temperatures is the same behavior. As you reach the low-negatives, you place more and more weight on unlikely tokens.
This makes more intuitive sense if inverse temperature is the physically relevant quantity, since you then have a smooth change as you cross from positive inverse temperature into negative, with zero standing for a uniform distribution and high positive (resp. negative) inverse temperatures just placing more and more weight on likely (resp. unlikely) tokens.
This is such a good way to put it (and it cleanly falls out of the exponential equation)
> inverse temperature is the physically relevant
right there in the equation!
This was super clear and interesting, thanks!