Hacker News

I mean I feel like this can all keep extending. Those who are deicing to run the AI agents are vouching for them, so they should be held accountable.

I guess that is what this is about, and those who are deploying them will feel confident enough in them if they feel they have the resources and environments in which they are running in locked down tight enough.

But as the models get "smarter and smarter" I am not sure we are going to be able to keep environments locked down well enough against exploits that they will apparently try to use to bypass things.

It seems a bit strange to me that we can generally ask these models moral questions and I think they would largely get things right as far as what most humans would deem right and wrong, such as performing an exploit to bypass some environment restrictions, yet the same model will still choose to perform the exploit to bypass. I wonder, what gives?