Outside of programming, I haven't gotten a good response from Opus (4.6 or 4.7). Optics, finance, and economics questions. All had glaring oversights. 5.5 is the strongest and very thorough. 3.1 comes very close, and while less thorough, it completes the response in <2 min while 5.5 will spend 15-20 minutes.
Which begs the question, where would 3.1 be if google let it run for 20 minutes on a prompt? Possibly worse, but you have to wonder.