no? it's better on AIME '24, Multilingual MMLU, SWE-bench, Aider’s polyglot, MMMU, ComplexFuncBench
and it ties on a lot of benchmarks
no? it's better on AIME '24, Multilingual MMLU, SWE-bench, Aider’s polyglot, MMMU, ComplexFuncBench
and it ties on a lot of benchmarks
look at all the graphs in the article
the data i posted all came from the graphs/charts in the article