Source? I ask because I use 500k+ context on these on a daily basis.

Big refactorings guided by automated tests eat context window for breakfast.

i find gemini gets real real bad when you get far into the context - gets into loops, forgets how to call tools, etc

yeah gemini is dumb when you tell it to do stuff - but the things it finds (and critically confirms, including doing tool calls while validating hypotheses) in reviews absolutely destroy both gpt and opus.

if you're a one-model shop you're losing out on quality of software you deliver, today. I predict we'll all have at least two harness+model subscriptions as a matter of course in 6-12 months since every model's jagged frontier is different at the margins, and the margins are very fractal.

I find gemini does that normally, personally. Noticeably worse in my usage than either Claude or Codex.

I find Gemini to be real bad. Are you just using it for price reasons, or?

How many big refactorings are you doing? And why?

How is that relevant? we are talking about models, now what you do with them.