I don’t think it’s the cli that was dumber, just the model it was using. They drastically reduced limits on their best model so that’s likely how you got stuck downgrading model and getting worse results.
I don’t think it’s the cli that was dumber, just the model it was using. They drastically reduced limits on their best model so that’s likely how you got stuck downgrading model and getting worse results.
I'm sensing in reality that behind the scenes there is a difficult trade-off between quantization and usage limits. You can have a "smart" model but poor limits, or good limits and a "dumb" model.
This seems very similar to mobile data limits (remember those years?), where there wasn't enough tower bandwidth to serve everyone unlimited data, so telecos were in constant tension between data caps and bandwidth throttling.
It wasn't until 5G came along with 100x network capacity that they could finally give everyone "unlimited" data.