For python, at least, Sonnet’s code is much more elegant, well composed, and thoughtfully written. It also seems to be biased towards more recent code, whereas the gpt models can’t even properly write an api call to itself.

o1 is pretty decent as a rotor rooter, ie the type of task that requires both lots of instruction as well as lots of context. I honestly think it works half as well as it does now because it’s able to properly mull through the true intent of the user that usually takes the multiple shots that nobody has the patience to do.

It is appalling how bad GPT-4o is at writing API calls to OpenAI using Python. It is like OpenAI doesn't update their own documentation in the GPT-4o training data since GPT-3.5.

I constantly have the problem that it thinks it needs to write code for the 0.28 version of the SDK. It'll be writing >1.0 code revision after revision, and then just randomly fall back to the old SDK which doesn't work at all anymore. I always write code for interfacing with OpenAI's APIs using Claude.