I am very confident that these tools are better than the median programmer at code review now. They are certainly much more diligent. An actually useful standard to compare them to is human review, and for technical problems, they definitely pass it. That said, they’re still not great at giving design feedback.
But GPT-5 Pro, and to a certain extent GPT-5 Codex, can spot complex bugs like race conditions, or subtly incorrect logic like memory misuse in C, remarkably well. It is a shame GPT-5 Pro is locked behind a $200/month subscription, which means most people do not understand just how good the frontier models are at this type of task now.