How are they able to compare with Fable when Fable was only available for three days?

Terminalbench numbers are publicly available. What is more interesting, why is that the only benchmark they highlight. Maybe 5.6 isn’t that far ahead of Fable 5 in DeepSWE and FrontierCode (which I consider the most useful and close to my evals + subjective experience)…