livebench was good, but now it's a joke. Gemini flash is better in coding than pro and sonnet 3.7. And this is only the beginning of weird results.
livebench was good, but now it's a joke. Gemini flash is better in coding than pro and sonnet 3.7. And this is only the beginning of weird results.
Flash is better than Pro in coding? Whoa... [makes a note to try a few things later this day]
Out of curiosity, how did you gauge that?
I think your parent comment is citing that as an example of why livebench is no longer a good benchmark. That said, the new Flash is very good for what it is, and IMO after the Pro 05-06 nerfs the two models are much closer in performance for many tasks than they really should be — Pro should be / was way better (RIP 03-25 release). That livebench result may be wrong about the specific ranking, but I think it's right that Flash is in the same class of coding strength as Sonnet 3.7.
Thanks, that's very informative.
My ignorance is showing here: why is the Pro 05-06 a nerf?