Doesn't "real" distillation use the logits instead of the final tokens? I would classify this more like using a model to generate synthetic training data.

Distillation is a category of techniques which generally speaking all extract knowledge from a target model to feed into a new model. Logit distillation requires access to the source model layers; final token distillation doesn't. The former is more effective, but the latter can be done with generation tokens alone.

This article explains the difference (and addresses "they're distillin' our models!!!"): https://dev.to/p0rt/how-model-distillation-actually-works-an...