There is some evidence.[1] The best reviewer is a different model with fresh context, worst is same model with same context.

1. https://arxiv.org/pdf/2603.04582