Not really, they're only as good as their context and they do miss and forget important things. It doesn't matter how often, because they do, and they will tell you with 100% confidence and with every synonym of "sure" that they caught it all. That's the issue.
I am very confident that these tools are better than the median programmer at code review now. They are certainly much more diligent. An actually useful standard to compare them to is human review, and for technical problems, they definitely pass it. That said, they’re still not great at giving design feedback.
But GPT-5 Pro, and to a certain extent GPT-5 Codex, can spot complex bugs like race conditions, or subtly incorrect logic like memory misuse in C, remarkably well. It is a shame GPT-5 Pro is locked behind a $200/month subscription, which means most people do not understand just how good the frontier models are at this type of task now.