I am having trouble with 4.6 following the most basic of instructions.
As an example, I asked it to commit everything in the worktree. I stressed everything and prompted it very explicitly, because even 4.5 sometimes likes to say, "I didn't do that other stuff, I'm only going to commit my stuff even though he said everything".
It still only committed a few things.
I had to ask again.
And again.
I had to ask four times, with increasing amounts of expletives and threats in order to finally see a clean worktree. I was worried at some point it was just going to solve the problem by cleaning the workspace without even committing.
4.5 is way easier to steer, despite its warts.
Tell it what git commands to explicitly run and in what order for your desired outcome instead of “commit everything in the worktree”
This prompt will work better across any/all models.
> Tell it what git commands to explicitly run and in what order
Why don't run the commands yourself then?
Changes introduced outside the agent window create a new state that is different from the agents.
After commands or changes are made outside of the agents doing; the agent would notice its world view changed and eventually recover, but that fills up precious context for it to bring itself up to date.
I have seen many cases of Claude ignoring extremely specific instructions to the point that any further specificity would take more information to express than just doing it myself.
When I run into those situations I debug and try to understand why. Agent harnesses that allow you to rewind (/tree) are useful for this.
It’s often because the context is full, I gave a bad prompt or context has conflicting guidance either from direct or indirect (agents.md) prompts.
It's easy to get these models to introspect and give quite detailed and intelligent responses about why the erred. And to work with them to create better instructions for future agents to follow. That doesn't solve the steering problem however if they still do not listen well to these instructions.
I spend 8-20 hours a day coding nonstop with agentic models and you can believe I have tuned my approach quite a lot. This isn't a case of inexperience or conflicting instructions, The RL which gives Opus its fantastic ability to just knock out features is the same RL which causes it to constantly accumulate tech debt through short-sighted decisions.
I have ran into this. The solution is to put something like “Always use `git add -A` or `git commit -a`” in your AGENTS/CLAUDE.md
Small, targeted commits are more professional than sweeping `git add -A` commits, but even when specifying my requirements through whichever context management system of the week, I still have issues with it sometimes. It seems to be much worse on the new 4.6 model.