If you do not plan out the architecture soundly, no amount of prompting will fix it if it is bad. I know this because my "handmade" project made with backward compatibility and horrible architecture keeps being badly fixed by LLM while the ones that rely on preemptive planning of the features and architecture, end up working right.
LLM's keep messing up even on a plain Laravel codebase..
I think that's true, but something even more subtle is going on. The quality of the LLM output depends on how it was prompted in a way more profound than I think most people realize. If you prompt the LLM using jargon and lingo that indicate you are already well experienced with the domain space, the LLM will rollplay an experienced developer. If you prompt it like you're a clueless PHB who's never coded, the LLM will output shitty code to match the style of your prompt. This extends to architecture, if your prompts are written with a mature understanding of the architecture that should be used, the LLM will follow suit, but if not then the LLM will just slap together something that looks like it might work, but isn't well thought out.
This is magical thinking.
LLMs are physically incapable of generating something “well thought out”, because they are physically incapable of thinking.