This has been my (admittedly limited) experience as well. LLMs are great at initial bring-up, good at finding bugs, bad at adding features.

But I'm optimistic that this will gradually improve in time.

The only regularity I can discern in contemporary online debates about LLMs is that for every viewpoint expressed, with probability one someone else will write in with the diametrically opposite experience.

Today it’s my turn to be that person. Large scientific code base with a bunch of nontrivial, handwritten modules accomplishing distinct, but structurally similar in terms of the underlying computation, tasks. Pointed GPT Pro at it, told it what new functionality I wanted, and it churns away for 40 minutes and completely knocks it out of the park. Estimated time savings of about 3-4 weeks. I’ve done this half a dozen times over the past two months and haven’t noticed any drop off or degradation. If anything it got even better with 5.4.

I’ve had good, alternative experience with my sideproject (adashape.com) where most of the codebase is now written by Claude / Codex.

The codebase itself is architected and documented to be LLM friendly and claude.md gives very strong harnesses how to do things.

As architect Claude is abysmal, but when you give it an existing software pattern it merely needs to extend, it’s so good it still gives me probably something like 5x feature velocity boost.

Plus when doing large refactorings, it forgets much fever things than me.

Inventing new architecture is as hard as ever and it’s not great help there - unless you can point it to some well documented pattern and tell it ”do it like that please”.