DeepSeek was trained with distillation. Any accurate estimate of training costs should include the training costs of the model that it was distilling.
DeepSeek was trained with distillation. Any accurate estimate of training costs should include the training costs of the model that it was distilling.
That makes the calculation nonsensical, because if you go there... you'd also have to include all energy used in producing the content the other model providers used. So now suddenly everyones devices on which they wrote comments on social media, pretty much all servers to have ever served a request to open AI/Google/anthropics bots etc pp
Seriously, that claim was always completely disingenuous
I don't think it's that nonsensical to realize that in order to have AI, you need generations of artists, journalists, scientists, and librarians to produce materials to learn from.
And when you're using an actual AI model to "train" (copy), it's not even a shred of nonsense to realize the prior model is a core component of the training.
Not just energy cost, but also licensing cost of all this content…