Hacker News

You are being deeply reductive and that's against the spirit of hacker news. The issue is that models are difficult to objectively benchmark. The benchmarks don't always align with real world performance. It's not easy and clear cut to determine which model will work best in a given situation. It boils down to loose experiences/anecdotes. Do you have an objective criteria for model selection that you have tested to be effective with reproducible tests?