How many times have you needed to reset the optimizer during the RL training cycles?