Ok. I also have the intuition that more tests and formal specifications can help there.
So far, my biggest issue is, when the code produced is incorrect, with a subtle bug, then I just feel I have wasted time to prompt for something I should have written because now I have to understand it deeply to debug it.
If the test infrastructure is sound, then maybe there is a gain after all even if the code is wrong.