Hacker News

no? it's better on AIME '24, Multilingual MMLU, SWE-bench, Aider’s polyglot, MMMU, ComplexFuncBench

and it ties on a lot of benchmarks

look at all the graphs in the article

the data i posted all came from the graphs/charts in the article