Tests don’t prove correctness of the code. What you’d really want instead is to specify invariants the code has to fulfill, and for the AI to come up with a machine-checkable proof that the code indeed guarantees those invariants.