You may want to try out pi-agent and create custom extensions instead.

Then codify this behavior into a process which automatically gets run through.

I.e. $repo/origin as bare repo, then prompt to create a shell script which creates the worktree and cds into it, running the script you mentioned, instantiating pi in it. Potentially define explicit phases for your workflow and show the phase in the UI - and quality gates for transitions. Eg force the implment to finalize phase to only happen if all tests succeeded. Potentially add multiple review phases here too, with different prompts. This progressively gets rid of more and more inconsistencies.

Still not a perfect solution, but on average I've had less and less to manually address with that workflow. Albeit at cost of tokens (multiple reviews phases obviously ingest all changes multiple time)

Pi-agents extensibility is just a lot better then the other harnesses, but you could obviously also just introduce a different orchestrator to do the same. For me, pi-agent was just the least amount of effort necessary to get it going.