Weird, I have had the opposite experience. Codex is good at doing precisely what I tell it to do, Opus suggests well thought out plans even if it needs to push back to do it.

This is just the stochastic nature of LLM's at play. I think all of the SOTA models are roughly equivalent, but without enough samples people end up reading into it too much.