The distinction is that LLMs are not used for what they are trained for in this case. In the vast majority of cases someone using an LLM is not interested in what some mixture of openai employees ratings + average person would say about a topic, they are interested in the correct answer.

When I ask chatgpt for code I don't want them to imitate humans, I want them to be better than humans. My reward function should then be code that actually works, not code that is similar to humans.