I've been using Claude to work on a medium-sized (100+kLoc) codebase, and it's a great productivity multiplier. Putting hours into creating a good AGENTS file is more improved results a lot. I find that over time it picks up the codebase quite well. Tedious tasks that would take a day are now a matter of a few prompts.

Still... I'm not ready to give it more autonomy. Even as it gets high-level things quite well, I still look at the code, give feedback, and have 3-4 rounds of tweaks until I'm happy with it, and also happy that I stil feel I have a good handle on the codebase.

Try to quantify those 3-4 rounds of tweaks into a set of rules to put into your AGENTS. Instead of iterating, have it start over from AGENTS file and see if it's correct now.

Ngl, that’s gold right here. I’ve been trying to automate my sessions, and what I’ve found cool is that you can ask Claude about how to improve on how to ask Claude things, and from there ask Claude to iterate on your session cycles

In Soviet Russia the AI prompt you.

Understandable. You don’t want to lose control to your codebase and don’t trust LLM is competent in handling that fully.

The percentage of times I prompt claude "what about checking if there are any child processes running?" or "Would using a lock here greatly simplify the design?" only to have myself be correct is approaching 100%. That is it isn't just claude sycophantically agreeing with me. The code itself becomes smaller, simpler, and more reliable with fewer bugs.

The agents tend to produce working code but the larger the scope the bigger the mess they tend to make. They will happily evolve toward a local maxima but leave world-destroying bugs lurking in the implementation.

The other issue is that claude regularly ignores explicit instructions in CLAUDE.md or in prompts. It will "helpfully" decide to just start doing whatever it wants or reinterpret instructions completely differently than it did the last 100 times.

It has nothing to do with losing control or trust. LLMs are not conscious. They have no executive function. They aren't even thinking. They're just models predicting the next word in the script. They are very useful tools but that's all they are: tools.

No. Because they still hallucinate at times. Confuse things. Forget things. Or none of the above, as it is anthropomorphizing, but the result is the same. They can make incredible working one shots, you start to trust them, then you trust too much and .. feel the result.

Yes. I am fighting with the disobeyance of LLM on working through my pipeline commands. I believe these violations are caused by its hallucinations. So I am still developing a mechanical system to monitor agents’ behaviors automatically. I believe these routines and monitors will play as a set of scaffold to keep leading the LLM on the right way all the time.