there is a big difference between seeing that pairs of input and output are correct, and knowing that the system is correct. there are invinitely many in and output pairs and only checking some while you vibe code your tool is never going to be a reliable method