Hacker News

lebron72 3 months ago [ - ]

This paper looks pretty groundbreaking. The ability to verify LLMs at scale (e.g., 70B) on real-world tasks like math reasoning and code security is extremely impressive and impactful.