In my opinion, we need more models trained on fully traceable and clean data instead of closed models that we later find out were trained on Reddit and Facebook discussion threads.
In my opinion, we need more models trained on fully traceable and clean data instead of closed models that we later find out were trained on Reddit and Facebook discussion threads.
I want to see something trained _only_ on stuff like encyclopedias, programming books, etc. I'm interested in how different it would be compared to something with a lot of social media in it.
Better to do a fine tune or a LoRA than a full retraining from scratch