Good point.
But I use Codex and Claude daily (work and hobby respectively). And there are days where one or the other just seems to have gotten up on the wrong side of the bed. Or is just being lazy. Or is suddenly super-powered do everything including what i asked it not to. (To be fair, the same thing happens with myself. :/)
I am convinced that if I was bench-marking, I would be convinced these are different models on different days.
[This conviction may say more about me then about the model.]