I've noticed even more than the "hallucinations", just the code is generally quite bad.
At least with concurrent and distributed systems stuff (which is really all I know nowadays), it is great at getting a prototype, but the code is generally mediocre-at-best and pretty sub-optimal. I don't know if it's because it is trained on a lot of mediocre and/or buggy code but for concurrency-heavy stuff I've been having to rewrite a lot of it myself.
I think that AI is great for getting a rough POC, and admittedly often a rough POC is good enough for a project (and a lot of projects never get beyond a rough POC), but I think software engineers will be needed for stuff that needs to be more polished.
I'm getting the impression that LLMs are just not very good at "reasoning" about time. I have definitely had success getting a coding agent to produce decent concurrent code, but I had to basically lead it by the nose, and I strongly suspect that in most cases it would have taken less time to just do it the old fashioned way.
I've had good luck having it translate TLA+ specs to programming languages. The specs are written by me and my fingers, and I've done most of the interesting concurrency reasoning beforehand.
I'm pretty sure it still saves me time, and if nothing else it's an excuse to write TLA+, and that's fun.
Numerous real world technical requirements can be solved with existing code, lightly modified. That’s basically LLM code’s bread and butter. The further you get from that, the closer the “time saved using LLM” line gets to zero, and once it crosses, it becomes the “time wasted using LLM” line. I think embedded and concurrent systems are going to require more unique code solutions than, say, a crud web app with a few interesting feature-building junkets.
The code is quite terrible, but no one has ever cared about code quality, at least in my experience. All they’ve ever cared about is that “it works”. It’s why an army of juniors always write most of the code.
I had this same discussion at work the other day. I had an 80k line generated project dropped on my plate. It doesn’t use anything built into the web framework or orm. It’s a maintenance nightmare.
I think there are plenty of projects where "good enough" really is "good enough"...maybe most apps? If you're just making a shitty simple app, I don't really care about code quality.
Example: I got Claude to generate a language server for TLA+ so I could have nice integration with Neovim. It took like 45 minutes of arguing with Claude and then it worked fine. This is incredibly low-stakes stuff: realistically the worst case scenario is that the text in the file gets screwed up, and I'm somewhat protected by Git if that happens.
That said, I am a little concerned how cavalier people have been deploying AI code everywhere. I don't want pacemaker firmware to be written by some intern in an afternoon with Claude.
Yes I agree, the low stake, low evolution code is perfect for LLMs. The project I was handed is not that at all.
Maybe you can ask Claude to reverse engineer what the original prompt was.