Claude notably does not use RLHF, but uses RLAIF, using a LLM to generate the preferences based a "constitution" instead of human preferences. It's remarkable that it can bootstrap itself up to such high quality. See https://arxiv.org/pdf/2212.08073 for more.

I thought Claude used human feedback due to Surge claiming they were a customer:

https://www.surgehq.ai/case-studies/anthropic-claude-surgeai...