I think you are making a distinction between pre training and later stages? The value on eg Fable output is exactly the careful preference optimization embedded in those responses. Not all data is the same (sorry if my first comment was sloppy on that).