I was talking to an old colleague/friend about distillation, trying to understand how to steer distillation with regards to removing irrelevant regions of a larger model when training a smaller model. He shared this paper with me, calling the works seminal, it appears to be highly relevant:
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model