"This transforms a vector of arbitrary real numbers into values between 0 and 1 that sum to 1"
Not really, softmax transforms logits (logariths of probabilities) into probabilities.
Probabilities → logits → back again.
Start with p = [0.6, 0.3, 0.1]. Logits = log(p) = [-0.51, -1.20, -2.30]. Softmax(logits) = original p.
NN prefer to output logits because they are linear and go from -inf to +inf.
Softmax is defined over an arbitrary vector of raw real numbers. Stating that those inputs are "logits" is applying post-hoc semantics to what the model is learning. One of the key properties of a softmax is scale invariance, (e.g. softmax([-1, 1, 3, 5]) == softmax([9, 11, 13, 15])) and so it is easiest to just think of it as operating on a vector of unnormalized raw scores, which is the more colloquial definition of logit.
(also, log(p) is not the formal definition of a logit)
It's still true that softmax transforms arbitrary vectors into probability vectors.
In your example you'll also get the original `p` with just `exp(logits)`. Softmax normalizes the output to sum to one, so it can output a probability vector even if the input is _not_ simply `log(p)`.
Logit is log odds, not log probability: logit(p) = log(p / 1 - p)