Most managed finetuning offerings take your dataset, some hyperparameters, and spit out a model. Few support RL, and those that do have very limited support.
And I have gotten a real lift, in cost effectiveness and engagement (for creative writing)
Most managed finetuning offerings take your dataset, some hyperparameters, and spit out a model. Few support RL, and those that do have very limited support.
And I have gotten a real lift, in cost effectiveness and engagement (for creative writing)
What are you doing that requires RL-ing creative writing for engagement?
I don't apply RL directly to engagement (and don't think it's really possible without some insane scale of feedback)
Instead there are mechanical mistakes models make that harm engagement and are trivially verifiable (overused phrases and concepts, hitting a given target reading level, etc.)
Improving those is what improves engagement.