I think a big issue with this approach is that the initial and last steps are prone to sycophancy: the machine wants you to believe it's getting the job done, which may lead it to do something correct-looking over something correct. The middle steps (correct-by-construction transformations) do not need an LLM at all. It's what a certified compiler does.

I think the way forward, for the immediate future, is to feed AI agents with a mixture of (hand-written) natural language and formal blueprints, then use as many mechanized analysis methods as possible on the generated code (from unit/regression testing to static analysis, and possibly more powerful software verification procedures). Potentially feed the output of these analyses back to the agents.