No one talking about how this flash Beats Pro? Imagine what 3.5 pro looks like?
Also concerned about Gemini models being benchmaxxed generally
No one talking about how this flash Beats Pro? Imagine what 3.5 pro looks like?
Also concerned about Gemini models being benchmaxxed generally
> concerned about Gemini models being benchmaxxed generally
I would say they are the least benchmaxxed out of all the top labs, for coding. They've always been behind opus/gpt-xhigh for agentic stuff (mostly because of poor tool use), but in raw coding tasks and ability to take a paper/blog/idea and implement it, they've been punching above their benchmarks ever since 2.5. I would still take 2.5 over all the "chinese model beats opus" if I could run that locally, tbh.
I have never had good experience with any Google models in coding. Particularly for coding hard stuff, there is a night and day difference between Opus/Gemini in my experience.