The examples matter for the correctness.

This example is a little flawed, but go with me. Imagine someone was making an argument that an algorithm is too powerful because it can solve really hard problems, then list a bunch of problems that need ridiculous amounts of compute time to solve, but half their examples are NP-hard and half their examples are P.

The intuition that says "wow, that problem is very difficult to solve, so I'm very skeptical of a solution" is wrong. Because that intuition applies to both the NP examples and the P examples. That intuition is too simplistic and overgeneral.

You need an intuition that is right with both classes of problem. It has to say "no" to one class and "yes" to another. Ignoring the wrong examples is not how you evaluate an intuition.