How do you check if what it produced is even the right thing? Models love to go chasing the wrong goal based on a reasonable spec.
How do you check if what it produced is even the right thing? Models love to go chasing the wrong goal based on a reasonable spec.
When the end result has problems and needs to be reworked.
You can't figure this out instantly except when you'd review everything the LLM produces, which I am not. So the round trip time is pretty long, but I can trace it back to the intent now because I commit every architecture decision in an ADRs, which I pour most of my energy into. These are part of the repo.
Using these ADRs helped a lot because most of the assumptions of the LLM get surfaced early on, and you restrict the implementation leeway.
Got it. I imagine concurrency bugs will hit hard with this approach because they show up rarely and are hard to debug.
Do they? I haven't experienced models deviating from a spec in a very long time. If anything I feel they are being too conservative and have started to ask to confirm too much.
The problem is not the LLM deviating from the plan (though that rarely also happens when it thinks it has a better idea) but rather if the plan is not strict enough and the LLM decides on the fly HOW it is going to build your plan.