> ok but how are you sure that the AI is correctly turning the spec into tests.

You use the specs to generate the tests, and you review the changes.