I'm doing something very similar but even simpler and Gemini 3 is absolutely crushing it. I tried to do this with other models in the past, but it never really felt productive.

I don't even generate diffs, just full files (though I try and keep them small) and my success rate is probably close to 80% one-shotting very complex coding tasks that would take me days.