lol I love how OpenAI just straight up doesn't compare their model to others on these release pages. Basically telling us they know Gemini and Opus are better but they don't want to draw attention to it

Not sure why they don't compare with others, but they are actually leading on the benchmarks they published. See here (bottom) for a chart comparing to other models: https://marginlab.ai/blog/swe-bench-deep-dive/

It's like apple, they just don't want users or anyone to even be thinking of their competitors, the competition doesn't exist, it's not relevant.

is swe-bench saturated? or they switch to swe-bench pro because...?

At least on swe-rebench it does pretty well: https://swe-rebench.com/

This was the one thing I scanned for. No comparison against Opus. See ya.

Though this Codex version isnt on the leaderboard, GPT-5.2-Medium already seems to be a bit better than Opus 4.5: https://swe-rebench.com/

Is that your website or something? You keep promoting it

No, I am not affiliated with the website, I just want to see more discussions based on uncontaminated benchmarks and feel that people rely too much on benchmarks that companies can conduct themselves. If that is the case, I don't feel I can trust them. For general LLM capabilities, for example, I would also tend to rely on dubesor [1] rather than artificial analysis or similar leaderboards.

[1] https://dubesor.de/benchtable