This is just img2img where first image with correct structure was generated by code.

Yup, that’s exactly what this is. If you’ve been using generative models since the early Stable Diffusion days, it’s a pretty common (and useful!) technique: using a sketch (SVG, drawn, etc) as an ad-hoc "controlnet" to guide the generative model’s output.

Example: In the past I'd use a similar approach to lay out architectural visualizations. If you wanted a couch, chair, or other furniture in a very specific location, you could use a tool like Poser to build a simple scene as an approximation of where you wanted the major "set pieces". From there, you could generate a depth map and feed that into the generative model, at the time SDXL, to guide where objects should be placed.

Pretty much what the author said- just gave some context for the uninitiated

Right, but you can use a different (codegen) model to make that code.