gemini isn't even that good. just tested 3.5 on usual complex prompts to opus/chat 5.5. meh

Are you really comparing flash to opus? Shouldn't you be comparing pro?

The benchmark tables in the Google announcement include Opus 4.7, and the numbers are very impressive. Caveat emptor, but it's not unreasonable to compare a new Flash to a current-gen Opus, even if some of the results confirm expectations

Who would have guessed that something costing roughly a third as much wouldn't do as well at certain tasks.

Well, the first impression is that Gemini still goes off the instruction rails easier than other models, but I noticed that it tends to go back to the initial goal without holding a hand, which is a real improvement. It's really interesting that these models behave so differently.