Codex (at least) already imposes the macOS sandbox on the shell commands it runs. If it wants to run something without sandbox imposition, the harness makes me approve it manually.
Is the difference with your script mostly that you choose to impose a stricter sandbox profile (and not allow any user-approved exceptions at runtime)?