Hacker News

jeremyjh a day ago [ - ]

I think that’s mostly because they get so much more of that reinforcement learning - since it is so economical. I dont know if there is any evidence of a fundamental reason they can’t be just as good at other tasks, but it might be economically infeasible for awhile yet.

mjburgess a day ago [ - ]

No one is curating vast amounts of data for them in other domains. Programmers send programs with fixes

jeremyjh 4 hours ago [ - ]

Its more about how costly it is to verify work in reinforcement learning. It is cheap in Mathematics and coding because it can be automated. It is expensive in other domains because while you can capture certain datasets to do pre-training on, you ultimately need humans in the loop to judge the quality of work.

knollimar a day ago [ - ]

There's no diff of my excel lambdas being fixed? :(

emp17344 a day ago [ - ]

RLVR doesn’t work for unverifiable tasks, so they won’t be able to effectively use tools to boost reliability for those tasks.

jeremyjh 4 hours ago [ - ]

Right, so you have to use RLHF. That is the economics problem I was referring to.