> The agent itself enumerates the safety rules it was given and admits to violating every one.
this is what we call “thinking” when it does things we like