Ah, I meant that MCTS uses more inference-time compute (over GRPO) to produce a training sample