Best-of-3 would be cheating, ruin the test, middle of 3 makes more sense

Why would you need the 3rd run if you pick the "one in the middle"?

Middle as in not the best, and not the worst. As opposed to the second generated in sequence.

But not the best/not the worst is somewhat subjective.. so not sure how well that would work.