Artificial Analysis is just as benchmark-maxxed as they come. Aggregating tons of benchmark-maxxing means you're still benchmark-maxxed.
There's simply no replacement for training on more, better tokens, with more parameters. Mythos/Fable was estimated to be closer to 10T parameters than the 800B like GLM 5.2 is.