What I haven't seen discussed anywhere so far is how big a lead Anthropic seems to have in intelligence per output token, e.g. if you look at [1].
We already know that intelligence scales with the log of tokens used for reasoning, but Anthropic seems to have much more powerful non-reasoning models than its competitors.
I read somewhere that they have a policy of not advancing capabilities too much, so could it be that they are sandbagging and releasing models with artificially capped reasoning to be at a similar level to their competitors?
How do you read this?
Intelligence per token doesn't seem quite right to me.
Intelligence per <consumable> feels closer. Per dollar, or per second, or per watt.
It is possible to think of tokens as some proxy for thinking space. At least reasoning tokens work like this.
Dollar/watt are not public and time has confounders like hardware.