Because judging failure is itself a complex task requiring a potentially expensive model.