RLHF really isn't the problem as far as surpassing human capability - language models trained to mimic human responses are fundamentally not going to do anything other than mimic human responses, regardless of how you fine-tune them for the specific type of human responses you do or don't like.

If you want to exceed human intelligence, then design architectures for intelligence, not for copying humans!