Uncompetitive how, what task and Eval?
Gemini is consistently the only model that can reason over long context in dynamic domains for me. Deep Think just did that reviewing an insane amount of Claude Code logs - for a meta analysis task of the underlying implementation. Laughable to think Grok could do that.