Get any model, any reasoning level, ask it to tackle a challenge, have it come up with a plan. Then ask it "are you sure? This feels wrong", and it will now think it's wrong. Do that again in a loop and you'll see how unnecessary human judgment actually is.

Or alternatively, have fable write some complex code. Then ask it to do an adversarial review of that code in a clean session. You'll find that it will find issues in the code that it just wrote.

Now imagine you're a layperson who doesn't know which one is true.

Human expertise is never going to become irrelevant.

Yea fair. I have that when I ask an LLM to prove the Riemann hypothesis. I am not mathematically mature. So I can’t see if it approaches it in any way that might yield some insight.