1) The business person made a mistake in their conversation/specification.
In this case the LLM will have generated code and tests that match the mistake. So all the tests will pass. The best way to catch this before it gets to production is to have someone else review the specification. But the problem is that the specification is a long trial-and-error conversation in which later parts may contradict earlier parts. Good luck reviewing that.
2) The LLM made a mistake.
The LLM may have made the mistake because of a hallucination which it cannot correct because in trying to correct it the same hallucination invalidates the correction. At this point someone has to debug the system. But we got rid of all the programmers.
This still resolves as "business person asks for code, business person gets code, business person says if code is good or not, business person deploys code".
That an LLM or a human is where the code comes from, doesn't make much difference.
Though it does kinda sound like you're assuming all LLMs must develop with Waterfall? That they can't e.g. use Agile? (Or am I reading too much into that?)
How do they do this? They can't trust the tests because the tests were also developed by the LLM which is working from incorrect information it received in a chat with the business person.
The same way they already do with humans coders whose unit tests were developed by exactly same flawed processes:
Mediocrely.
Sometimes the current process works, other times the planes fall out of the sky, or updates causes millions of computers to blue screen on startup at the same time.
LLMs in particular, and AI in general, doesn't need to beat humans at the same tasks.
There are two reasons the system might fail:
1) The business person made a mistake in their conversation/specification.
In this case the LLM will have generated code and tests that match the mistake. So all the tests will pass. The best way to catch this before it gets to production is to have someone else review the specification. But the problem is that the specification is a long trial-and-error conversation in which later parts may contradict earlier parts. Good luck reviewing that.
2) The LLM made a mistake.
The LLM may have made the mistake because of a hallucination which it cannot correct because in trying to correct it the same hallucination invalidates the correction. At this point someone has to debug the system. But we got rid of all the programmers.
This still resolves as "business person asks for code, business person gets code, business person says if code is good or not, business person deploys code".
That an LLM or a human is where the code comes from, doesn't make much difference.
Though it does kinda sound like you're assuming all LLMs must develop with Waterfall? That they can't e.g. use Agile? (Or am I reading too much into that?)
> business person says if code is good or not
How do they do this? They can't trust the tests because the tests were also developed by the LLM which is working from incorrect information it received in a chat with the business person.
The same way they already do with humans coders whose unit tests were developed by exactly same flawed processes:
Mediocrely.
Sometimes the current process works, other times the planes fall out of the sky, or updates causes millions of computers to blue screen on startup at the same time.
LLMs in particular, and AI in general, doesn't need to beat humans at the same tasks.