What would be the incentive to engage in the tactic when the proof is ultimately in the pudding when the model hits the streets? Who would ultimately benefit from fudging these numbers?
What would be the incentive to engage in the tactic when the proof is ultimately in the pudding when the model hits the streets? Who would ultimately benefit from fudging these numbers?
Anthropic would def benefit as benchmarks are almost always quite useless vs real life use.
How specifically would they benefit. People flock to them based on the hype and then the model sucks and they leave?