It would be helpful if you could define “useful” in this context.
I’ve built a number of team-specific tools with LLM agents over the past year that save each of us tens of hours a month.
They don’t scale beyond me and my six coworkers, and were never designed to, but they solve challenges we’d previously worked through manually and allow us to focus on more important tasks.
The code may be non-optimal and won’t become the base of a new startup. I’m fine with that.
It’s also worth noting that your evidence list (increased CVEs, outages, degraded quality) is exclusively about what happens when LLMs are dropped into existing development workflows. That’s a real concern, but it’s a different conversation from whether LLMs create useful software.
My tools weren’t degraded versions of something an engineer would have built better. They’re net-new capability that was never going to get engineering resources in the first place. The counterfactual in my case isn’t “worse software”—it’s “no software.“
It really shouldn't be this hard to just provide one piece of evidence. Is anecdotes of toy internal greenfield projects that could probably be built with a drag and drop no-code editor really the best from this LLM revolution?
What is your bar for “useful”? Let’s start there and we’ll see what evidence can be offered.
User count? Domain? Scope of development?
You have something in mind, obviously.
If you're asking me to define a very clear bar, it's obvious nothing cleared it.
Anything that proves that LLMs increase software quality. Any software built with an LLM that is actually in production, survives maintenance, doesn't have 100 CVEs, that people actually use.