I bet any flagship model would do as well if you prompted it with how it should do it.

Comparing grok vs Gemini vs GPT vs Sonnet is like comparing mid-high end CPUs. They're all about as good as one another for most work.