1.0 is "natural units". If your energy corresponds to nats, you should be using temperature 1.0. If your energy corresponds to bits, you should be using temperature ln(2) ~= 0.7. The optimization pressure is
max nats = max entropy + energy / temperature
Why might energy correspond to bits or nats? Imagine your goal is to play as many interesting games of chess as possible in a tournament. This implies you have to keep winning. If you look at the RL environment from the right perspective, you can turn it into optimizing bits or nats.