I treat LLMs as statistics driven compression of knowledge and problem solving patterns.
If you treat it as such it is all understandable where they might fail and where you might have to guide them.
Also treat it as something that during training has been biased to produce immediate impressive results. This is why it bundles everything into single files, try catch patterns where catch will return mock data to show impressive one shot demo.
So the above you have to actively fight against, to make them prioritise scalability of the codebase and solutions.