I want to see something trained _only_ on stuff like encyclopedias, programming books, etc. I'm interested in how different it would be compared to something with a lot of social media in it.

Better to do a fine tune or a LoRA than a full retraining from scratch