Maybe it's relative? Claude beats GPT-4/o by a far margin for me but I am mostly using them for Rust.

I also think there are subtle differences in how models like to be prompted, so some people will have more luck with one type of model.