This ignores that reinforcement learning radically changes the training objective