Models have their “personalities” for sure but that expensive model is better is maybe just a confirmation bias.
(There was a blind test in Wine Enthusiasist magazine - even sommeliers didn’t recognize expensive wines from cheaper alternatives.)
But ofc if you get perfect results in one shot from expensive model, it is cheaper than wrangling with cheap model for an hour…(just an example).
But what I see hard is to navigating so many models available - HuggingFace has 2,769,687 models listed…
So every comparison like this or at models.dev or arena.ai is good.