The most striking difference to me is that o3 and o4 know when the web search tool is unavailable, and will tell you they can't answer a question that requires it. While 4o and (sadly) 4.1 will just make up a bunch of nonsense.

I'm simultaneously impressed that they can do that, and also wondering why the heck that's so impressive (isn't "is this tool in this list?" something GPT-3 was able to handle?) and why 4.1 still fails at it too—especially considering it's hyped as the agentic coder model!

That's pretty damning for the general intelligence aspect of it, that they apparently had to special-case something so trivial... and I say that as someone who's really optimistic about this stuff!

That being said, the new "enhanced" web search seems great so far, and means I can finally delete another stupid 10 line Python script from 2023 that I shouldn't have needed in the first place ;)

(...Now if they'd just put 4.1 in the Chat... why the hell do I need to use a 3rd party UI for their best model!)