Hacker News

gemini isn't even that good. just tested 3.5 on usual complex prompts to opus/chat 5.5. meh

Are you really comparing flash to opus? Shouldn't you be comparing pro?

The benchmark tables in the Google announcement include Opus 4.7, and the numbers are very impressive. Caveat emptor, but it's not unreasonable to compare a new Flash to a current-gen Opus, even if some of the results confirm expectations

bachmeier 10 hours ago [ - ]

Who would have guessed that something costing roughly a third as much wouldn't do as well at certain tasks.

kmac_ 10 hours ago [ - ]

Well, the first impression is that Gemini still goes off the instruction rails easier than other models, but I noticed that it tends to go back to the initial goal without holding a hand, which is a real improvement. It's really interesting that these models behave so differently.