Thanks! In practice it can be a fraction of that. The judge sees a much smaller slice than the original call, and it's usually a smaller model making it 10-20x cheaper than the model running your agent. In some cases we've also seen folks who were using a frontier model mainly for reliability end up comfortable downgrading their agent model to a cheaper one. We'll add a proper cost breakdown in the docs!