Y
Hacker News
new
|
ask
|
show
|
jobs
at2005
9 days ago
[
-
]
Ah, I meant that MCTS uses more inference-time compute (over GRPO) to
produce
a training sample
Please enable JavaScript to continue using this application.