Turns out what was being rewarded all along is "the code looks all right" and "it looks like it works".

No, what is rewarded is "the code has been shown to conform to the given objective function" and especially "that objective function is a good representation of what we are trying to accomplish with this code".