Hacker News

It's a bit disingenuous to pick go as a case to make the point against RLHF.

Sure, a board game with an objective winning function at which computers are already better than humans won't get much from RLHF. That doesn't look like a big surprise.

On the other hand, a LLM trained on lots of not-so-much curated data will naturally pick up mistakes from that dataset. It is not really feasible or beneficial to modify the dataset exhaustively, so you reinforce the behaviour that is expected at the end. An example would be training an AI in a specific field of work: it could repeat advices from amateurs on forums, when less-known professional techniques would be more advisable.

Think about it like kids naturally learning swear words at school, and RLHF like parents that tell their kids that these words are inappropriate.

The tweet conclusion seems to acknowledge that, but in a wishful way that doesn't want to concede the point.