I don't apply RL directly to engagement (and don't think it's really possible without some insane scale of feedback)
Instead there are mechanical mistakes models make that harm engagement and are trivially verifiable (overused phrases and concepts, hitting a given target reading level, etc.)
Improving those is what improves engagement.