Their definition of "cheating" has nothing to do with the model being misaligned, it's a symptom of their benchmark sucking.