It sounds really negative about RLHF. Yet, if I read on them correctly, that’s a big part of how ChatGPT and Claude got so effective. There’s companies collecting quality, human responses to many prompts. Companies making models buy them. Even the synthetic examples come from models that largely extrapolate what humans wrote in their pre-training data.

So, I’m defaulting on RLHF is great in at least those ways until an alternative is empirically proven to be better. I also hope for larger, better, open-source collections of RLHF training data.

Claude notably does not use RLHF, but uses RLAIF, using a LLM to generate the preferences based a "constitution" instead of human preferences. It's remarkable that it can bootstrap itself up to such high quality. See https://arxiv.org/pdf/2212.08073 for more.

I thought Claude used human feedback due to Surge claiming they were a customer:

https://www.surgehq.ai/case-studies/anthropic-claude-surgeai...