Much business logic is really just a state machine where all the states and all the transitions need to be handled. When a state or transition is under-specified an LLM can pass the ball back and just ask what should happen when A and B but not C. Or follow more vague guidance on what should happen in edge cases. A typical business person is perfectly capable of describing how invoicing should work and when refunds should be issued, but very few business people can write a few thousand lines of code that covers all the cases.
> an LLM can pass the ball back and just ask what should happen when A and B but not C
What should the colleagues of the business person review before deciding that the system is fit for purpose? Or what should they review when the system fails? Should they go back over the transcript of the conversation with the LLM?
As an LLM can output source code, that's all answerable with "exactly what they already do when talking to developers".
There are two reasons the system might fail:
1) The business person made a mistake in their conversation/specification.
In this case the LLM will have generated code and tests that match the mistake. So all the tests will pass. The best way to catch this before it gets to production is to have someone else review the specification. But the problem is that the specification is a long trial-and-error conversation in which later parts may contradict earlier parts. Good luck reviewing that.
2) The LLM made a mistake.
The LLM may have made the mistake because of a hallucination which it cannot correct because in trying to correct it the same hallucination invalidates the correction. At this point someone has to debug the system. But we got rid of all the programmers.
This still resolves as "business person asks for code, business person gets code, business person says if code is good or not, business person deploys code".
That an LLM or a human is where the code comes from, doesn't make much difference.
Though it does kinda sound like you're assuming all LLMs must develop with Waterfall? That they can't e.g. use Agile? (Or am I reading too much into that?)
> business person says if code is good or not
How do they do this? They can't trust the tests because the tests were also developed by the LLM which is working from incorrect information it received in a chat with the business person.
The same way they already do with humans coders whose unit tests were developed by exactly same flawed processes:
Mediocrely.
Sometimes the current process works, other times the planes fall out of the sky, or updates causes millions of computers to blue screen on startup at the same time.
LLMs in particular, and AI in general, doesn't need to beat humans at the same tasks.
How does a business person today decide if a system is fit for purpose when they can't read code? How is this different?
They don't, the software engineer does that. It is different since LLMs can't test the system like a human can.
Once the system can both test and update the spec etc to fix errors in the spec and build the program and ensure the result is satisfactory, we have AGI. If you argue an AGI could do it, then yeah it could as it can replace humans at everything, the argument was for an AI that isn't yet AGI.
The world runs on fuzzy underspecified processes. On excel sheets and post-it notes. Much of the world's software needs are not sophisticated and don't require extensive testing. It's OK if a human employee is in the loop and has to intervenes sometimes when an AI-built system malfunctions. Businesses of all sizes have procedures where problems get escalated to more senior people with more decision-making power. The world is already resilient against mistakes made by tired/inattentive/unintelligent people, and mistakes made by dumb AI systems will blend right in.
> The world runs on fuzzy underspecified processes. On excel sheets and post-it notes.
Excel sheets are not fuzzy and underspecified.
> It's OK if a human employee is in the loop and has to intervenes sometimes
I've never worked on software where this was OK. In many cases it would have been disastrous. Most of the time a human employee could not fix the problem without understanding the software.
All software that interops with people, other businesses, APIs, deals with the physical world in any way, or handles money has cases that require human intervention. It's 99.9% of software if not more. Security updates. Hardware failures. Unusual sensor inputs. A sudden influx of malformed data. There is no such thing as an entirely autonomous system.
But we're not anywhere close to maximally automated. Today (many? most?) office workers do manual data entry and processing work that requires very little thinking. Even automating just 30% of their daily work is a huge win.