I think you meant to link to
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR https://arxiv.org/abs/2509.02522
not
Winning Gold at IMO 2025 with a Model-Agnostic Verification-and-Refinement Pipeline https://arxiv.org/abs/2507.15855
We've changed the top link to that from https://arxiv.org/abs/2507.15855. Thanks!
That paper is really cool too though. I'm happy that your comment sort of records the old link, because I only saw the right paper.
Ack, thank you.