Humble mention, I’ve been thinking the same thing with Ossature for the last couple of months since I started working on it: https://ossature.dev
The models are already good enough for code generation. What we need is the harness around them actually deterministically enforcing a specific path and “leashing” the models output to be aligned with the intention of the user as much as possible. You can’t make the output of the model deterministic, but you can make everything around it to be so.
Trying to make enforcements work with prompts is like a government agency investigating/auditing itself, there’s no incentive to find problems, so you’ll always inevitably get the “All Good, Boss!”
> so you’ll always inevitably get the “All Good, Boss!”
Or the opposite depending on how you ask the questions, some automated code review tools _always_ find issues, even when they don't really exist, or they exist in the scope of a function but not once the function is wired in the project.