Hacker News

Y

Hacker News

new | ask | show | jobs

virangjhaveri 10 hours ago [ - ]

Do you reward the RL model based on the token consumption when multiple LLMs complete the task ?