Hacker News

I agree I left out option 0, but I think the other two were presented correctly?

- Black box distillation uses direct answers to questions and conversation style. This is less useful as you still have to do supervised fine-tuning on the answers, as they may be wrong, and don't lead to greater insights (which reinforcement learning does)

- RLIAF relies on preferences and values to judge answers. These don't need supervised fine-tuning and help guide the new model to better answers rather than just correcting specific previously asked answers