I think it would be expensive to check. For a coding task any reviewer would need to understand programming (these people aren't cheap), the domain context, cultural differences (e.g. American "cookie" vs British "biscuit"), and make a determination.

If the AI companies just paid all of that out of the goodness of their pocketbook I'd be fine with it, but in reality I think they'd just pass on the costs. The same way that basically every business passes on spoilage, theft, return rates, etc.. So I think the value would be risk mitigation rather than cost (as in, you know if you pay for $10 worth of tokens, it will $10 worth of good tokens, but the individual token cost would need to account for all the tokens that the company doesn't get paid for)