Very neat! A) If I want to have a different grading rubric per example (and grade with an LLM as a judge), do I do this through the reward function? B) What's the pricing on the deployed API? (Is it per token?)
Very neat! A) If I want to have a different grading rubric per example (and grade with an LLM as a judge), do I do this through the reward function? B) What's the pricing on the deployed API? (Is it per token?)
A) You could have an additional field in the jsonl file which says which rubric to use; then, your reward function could access this via `kwargs["rubric"]` and return a reward based on that example's preferred rubric;
B) currently, pricing on the deployed API is free, but the startup time is a few minutes and it's run on a small GPU node and is therefore not awfully fast. If you would like more production-level inference, email us at founders@runrl.com and we could set you up with something much faster (where we'd charge per token depending on model size)