Hacker News

Thanks. Is this mainly for verifiable tasks or any general task

It's for any task that has an "eval", which is often verifiable tasks or ones that can be judged by LLMs (e.g. see [0]). There's also been recent work such as BRPO [1] and similar approaches to make more and more "non-verifiable" tasks have verifiable rewards!

[0]: https://runrl.com/blog/funniest-joke

[1]: https://arxiv.org/abs/2506.00103

-_- 3 days ago [ - ]

There needs to be some way of automatically assessing performance on the task, though this could be with a Python function or another LLM as a judge (or a combination!)