One of the problems lies in the way RLHF is often performed: presenting a human with several different responses and having them choose one. The goal here is to create the most human-like output, but the process is instead creating outputs humans like the most, which can seriously limit the model. For example, most recent diffusion-based image generators use the same process to improve their outputs, relying on volunteers to select which outputs are preferable. This has lead to models that are comically incapable of generating ugly or average people, because the volunteers systematically rate those outputs lower.