This is clever and provides a clean alternative to using custom plugins and mcp servers for doing code reviews.

For example, with the degradation of Claude in the past 1-2 months, I am always asking Codex to review Claude's plans and vice versa and I get excellent results that way.

Also, making a skill an API call allows for easy deployment if the security around tool calling could be isolated in an ephemeral sandbox.

Thanks! Sandbox deployment is planned in the roadmap. I already have a RuntimeAdapter interface in my architecture that I'll use to isolate the VMs. I'm doing exactly the same thing: I'm cross-referencing the models to challenge their plan, and my code reviewer agent's API is a big help.