This is the pain point that existed for years now and its still not solved at all.
"If A, do X. Do B,C,D. Do A" - and it just never uses X because "it forgot".
You just cant trust that the time you spend building rules will actually pay off, in fact you can trust that it will fail you sooner or later.
RAG, Harness, Skills... all was supposed to fix this, but in reality it never had.
Harnesses do fix it IMO - it’s why Claude code and Codex had a massive jump in alleged productivity on release and then seems to have flatlined. But a custom harness _would_ allow you to do things like “on every message, run lint validation and tests”. That in and of itself would be wildly useful.
The harnesses we have are almost stunningly incomplete IMHO. I've been trying `pi` recently, and quite like that it comes with a minimal set of tools by default -- and that I can easily override or replace the ones that it ships.
I've only just started working with it, but clamping `read/write/edit` to only allow editing files in the current directory, banning `bash` and mandating I write tools for the specific commands I want it to execute, has made me much happier. Running Claude inside a VM or similar to sandbox it is nuclear overkill; I've always been surprised that that's seemed like the state of the art.
With a better harness, the model can't choose to rename things with search and replace; if it wants to rename things, it _must_ call the LSP to do it. If it's going to write code, as you suggest, the harness _forces_ linting/formatting to run.
(Reading my own comment back, I am worried that the fucking AI writing style is infecting me :()
One of the problems with tools is the permissions for them. I can either grant Claude access to this one specific python command, or free run with python to do whatever it wants, but not “you can execute the python scripts in this directory structure”.
Claude’s “api access required” approach means that I can’t even experiment with customising the harness without doubling up…
a colleague using OpenCode was telling me it has linting/formatting configurable at harness level and I can't see why this is in every harness
Honestly - I think it's because it goes against the "vibe" part of the tooling - why do you care what the code looks like as long as when you run it it does what you want it to do?