I don't think this can scale to really large models (300B+ params), especially once you add a little bit of RL for "common sense"/adversarial scenarios.
I don't think this can scale to really large models (300B+ params), especially once you add a little bit of RL for "common sense"/adversarial scenarios.