Among benchmarkers its a frequent topic. Qwen BURNS reasoning to get its scores.