I'm not really bullish on OpenAI. Why would they only compare with their own models? The only explanation could be that they aren't as competitive with other labs as they were before.

Apple compares against its own products most of the times.

I don't mind what they benchmark against as long as, when I use the model, it continues to give me better results than their competition.

Go look at their past blog posts. OpenAI only ever benchmarks against their own models.

Oh, ok. But it's still quite telling of their attitude as an organization.

It's the same organization that kept repeating that sharing weights of GPT would be "too dangerous for the world". Eventually DeepSeek thankfully did something like that, though they are supposed to be the evil guys.