Hacker News

I’ve been doing RLHF and adjacent work for 6 months. The model responses across a wide array of subject matter are surface level. Logical reasoning, mathematics, step by step, summarization, extraction, generation. It’s the kind of output the average C student is doing.

We specifically don’t do programming prompts/responses nor advanced college to PHD level stuff, but it’s really mediocre at this level and these subject areas. Programming might be another story, I can’t speak to that.

All I can go off is my experience but it’s not been great. I’m willing to be wrong.