The problem is even harder than you make it look: even if the model founds plenty of “I don't know” answer in its training corpus it doesn't mean that this is the desirable answer to the questions: the model can know the answer even if one person on the internet doesn't.
“I don't know” must be derived from the model's knowledge as a whole, not from individual question/anser pairs in training.