Hacker News

new | ask | show | jobs

srush 4 days ago [ - ]

There is a footnote that should help with the models. Training is a harder thing to report on, but roughly our finding here is that RL scales.