Don't they add a KL loss term to the frozen model's outputs?