The problem is that the differences between flagship and local models are compounding heavily. An 4% different could be massive when you keep iterating on the same code base.
The problem is that the differences between flagship and local models are compounding heavily. An 4% different could be massive when you keep iterating on the same code base.
> The problem is that the differences between flagship and local models are compounding heavily
This depends a lot on how you work, and how much of the architectural thinking you do yourself.
People seem to lose sight of the fact that a flash model today is as powerful as a frontier model from a year ago. If you were happy with GPT 4.x, you should be ecstatic that equivalent power is now basically free...
I am one of those ecstatic folk :)