My worry with the confidence scoring is that it conflates "an agent used this and didn't obviously break" with "this is correct". An agent can follow bad advice for several steps before anything fails. So a KU gaining confirmation weight doesn't tell you much about whether it's actually true, just that it propagated. You're crowd-sourcing correctness from sources that can't reliably detect their own mistakes.
It's why at Tessl we treat evals as a first-class part of the development process rather than an afterthought. Without some mechanism to verify quality beyond adoption, you end up with a very efficient way to spread confident nonsense at scale.