Similar to how human brains behave, it is easier to train a model to select a better solution between many choices than to check an individual solution for correctness [1], which is in turn an easier task to learn than writing a correct single solution in the first place.

[1] the diffs in logic can suggest good ideas that may have been missed in subsets of solutions.