They're basing this all on public benchmarks which stopped being a reliable indicator of anything the last 2-3 years. Of course it'll be filled with more terrible lines of thoughts.
People really need to stop placing such importance on public benchmarks. They're valuable for comparing very close models, useful to evaluate if quantization and similar have negative impact, but you're not gonna be able to tell if one model is better than the other based on one scoring a few percentage points higher than a completely different model.
> This is a terrible line of thought
They're basing this all on public benchmarks which stopped being a reliable indicator of anything the last 2-3 years. Of course it'll be filled with more terrible lines of thoughts.
People really need to stop placing such importance on public benchmarks. They're valuable for comparing very close models, useful to evaluate if quantization and similar have negative impact, but you're not gonna be able to tell if one model is better than the other based on one scoring a few percentage points higher than a completely different model.