Yeah, the funniest thing about everyone freaking out about Fable's capabilities recently was that for most of the stuff they were amazed by, you could get roughly the same result from DeepSeek Flash.
I used to be obsessed with what's the best model. Then a while back when the new best model came out, I tested it on a task. I also tested its little brother (much smaller model from same company).
They both completed the task perfectly except the "best" model (the bigger one) cost 5x more and took 3x longer...
"Best model" discourse always remember me of my days in Monster Hunter with people who refused to consider playing with anything other than the meta set for their weapon and then proceed to immediately cart right at the beginning of the hunt :)
With the wealth of models available (open source vs closed, api vs local), I find optimizing the cost-efficiency of your token consumption an important part of business-oriented AI engineering. You don't need "the best" for every task.
A lot of the monetarization strategies for LMM's depend on the need to use them via SaaS subscriptions. If companies start to realize that local AI is cheaper, provides good enough results and makes them independent then that monetarization strategy falls apart and a whole industry collapses.
> They both completed the task perfectly except the "best" model (the bigger one) cost 5x more and took 3x longer...
Same for me, I certainly don't have the same definition of success and failure either.
A more expensive model has *less* rooms for wandering around than a cheaper model.
If Claude wanders around during 10min until finding the most obvious solution, then I count it as a failure.