There is no OpenAI model better than R1, reasoning or not (as confirmed by the same Aider benchmark; non-coding tests are less objective, but I think it still holds).

With Gemini (current SOTA) and Sonnet (great potential, but tends to overengineer/overdo things) it is debatable, they are probably better than R1 (and all OpenAI models by extension).