Hacker News

> the extrapolation should have been straightforward,

Indeed."By late next month you'll have over four dozen husbands" https://xkcd.com/605/

> So, here's some support for my second claim:

I don't think any of these links support the claim that "No SOTA model was trained without verified reward training for math and programming"

https://arxiv.org/abs/2507.06920: "We hope this work contributes to building a scalable foundation for reliable LLM code evaluation"

https://arxiv.org/abs/2506.11425: A custom agent with a custom environment and a custom training dataset on ~800 predetermined problems. Also "Our work is limited to Python"

https://arxiv.org/abs/2502.06807: The only one that somewhat obliquely refers to you claim