To be honest even the cloud models are a hot mess at times. This week I’ve spent more time rejected code from OpenAI models than I have approving it.
In fact it really feels like OpenAI models have taken a nose dive this week compared with Claude. At least for my specific workloads (these things are so variable it’s like trying to compare Google results…)