Not my observation. If you never look at the code and dont have basic guardrails in place (linters, architecture tests, some guidelines for best practices) - probably.
But as soon as you do minimal reviews and high-level corrections, applications turn out just fine.
Can there be bugs? Sure. That's the price of not reading or understanding every line. It should depend on the criticality of your software how much of these you tolerate and how much you don't (reviewing, understanding, testing everything 100% like you were used to if you had written it yourself will kill most if not all of your gained speed)
But I never got the impression of unmaintainability or unfixable bugs.
Actually the other side around: A really good cleanup pass, architectural changes, or bugfixes are seldom more than a few prompts and 2 hours away, provided your overall base is decent and you actually gave a fuck from the start.
> Can there be bugs? Sure. That's the price of not reading or understanding every line.
I've yet to come across a human developer who's output would meet this standard, despite writing every line.
In fact, having an LLM review our code is catching quite a few bugs before it reaches QA.
Indeed, though I find the distribution is different.
The humans may skip unit tests and need reminding; the AI always write unit tests once it's in AGENTS.md or whatever, but my experience* was that 5-10% of the time the LLM's attempt at a "test" would, instead of executing the code and examining the results, open the source code as a text file and run a regex to find/exclude certain substrings.
* At the start of this year, because Anthropic and OpenAI were both offering free trials. IDK how much things have changed since then, some things change fast in this domain, other things don't.
I’ve been piloting LLMs for the past six months non stop and we’re at the point where formally verified models generated as an intermediate step between spec and code are very good value.
Riding the exponential means you have to update priors more often.
I have seen some pre-AI over-mocked codebases where the "tests" where essentially that (but harder to read than regex would have been)
[dead]