What is this comment? It’s an RL paper, these are standard RL terms
It's a comment. On Hacker News. Not the RL subreddit, or whatever. I'm just amazed at the jargon. I'm sure it's useful, but one could just call it model output.
https://en.wikipedia.org/wiki/Reinforcement_learning#Policy
> one could just call it model output.
That would be incorrect. My other reply attempts to address this.
But the probability vector is the output of the LLM, no?
It's a comment. On Hacker News. Not the RL subreddit, or whatever. I'm just amazed at the jargon. I'm sure it's useful, but one could just call it model output.
https://en.wikipedia.org/wiki/Reinforcement_learning#Policy
> one could just call it model output.
That would be incorrect. My other reply attempts to address this.
But the probability vector is the output of the LLM, no?