Fantastic. Could you share more details what it was like post-training a model?

The RL is easy to describe, hard to do. The nice thing about pen testing is the reward isn't a vibe like training for code quality, the exploit either lands or it doesn't. The day to day is not glamorous at all, mostly fighting for stable gpu access, watching a cluster sit half-idle with nodes you somehow can't book.