Manual verification that the "judge" judges correctly.

Also, how exactly do you programmatically validate CVEs?

Most open-source CVEs will have a patch linked in their disclosure. You can get vulnerable code via the git diff, then just verify if it is part of the LLM's finding.