Hacker News

Y

Hacker News

new | ask | show | jobs

at2005 9 days ago [ - ]

Ah, I meant that MCTS uses more inference-time compute (over GRPO) to produce a training sample