I am aware of RLHF, and no it doesn’t solve this problem.

There’s a great deal of lesions to be learned from X PB of training data that wouldn’t be covered.