> Now, we are attempting to sandbox something that potentially has the agency and reasoning capabilities to try and get itself out.
The threat model for actual sandboxes has always been "an attacker now controls the execution inside the sandbox". That attacker has agency and reasoning capabilities.