Hacker News

I'm from Spain and I also hate these projects with passion. Creating models that speak multiple languages is a solved problem. Having each European Nation train its own useless "sovereign model" in its own language is a total waste of time and resources when we could pool resources and give it a try to training SOTA models that speak in all European languages.

I'd rather have smaller european labs try to give it a go at distributed training. If multiple countries got together and said, "look, we tried training a distributed model that speaks in all of our local languages and that is comparable to 1-year-old Chinese open-source models", that, at least, I would find interesting.

Zambyte 8 hours ago [ - ]

Excuse my ignorance if by "distributed training" you mean a specific process, but couldn't this be considered a step toward distributed training? If nations train models independently and then later distill them into a single model, all the work (both the compute and the research processes) are distributed for the initial training phase.

andy12_ 7 hours ago [ - ]

I mean it as in, train a model across different clusters instead of a centralized cluster. It's been shown that it's possible to train 10B models this way. If more research effort was put into this, that would be great

I don't think your approach would work because you can't create a strong model from distilling several weak models.

https://www.primeintellect.ai/blog/intellect-1

https://www.primeintellect.ai/blog/intellect-2-release