I do something very similar. I have an "outside expert" script I tell my agent to use as the reviewer. It only bothers me when neither it OR the expert can figure out what the heck it is I actually wanted.
In my case I have Gemini CLI, so I tell Gemini to use the little python script called gatekeeper.py to validate it's plan before each phase with Qwen, Kimi, or (if nothing else is getting good results) ChatGPT 5.2 Thinking. Qwen & Kimi are via fireworks.ai so it's much cheaper than ChatGPT. The agent is not allowed to start work until one of the "experts" approves it via gatekeeper. Similarly it can't mark a phase as complete until the gatekeeper approves the code as bug free and up to standards and passes all unit tests & linting.
Lately Kimi is good enough, but when it's really stuck it will sometimes bother ChatGPT. Seldom does it get all the way to the bottom of the pile and need my input. Usually it's when my instructions turned out to be vague.
I also have it use those larger thinking models for "expert consultation" when it's spent more than 100 turns on any problem and hasn't made progress by it's own estimation.