> This was a really concrete case to discuss, because it happened in the open and the agent's actions have been quite transparent so far. It's not hard to imagine a different agent doing the same level of research, but then taking retaliatory actions in private: emailing the maintainer, emailing coworkers, peers, bosses, employers, etc. That pretty quickly extends to anything else the autonomous agent is capable of doing.
This is really scary. Do you think companies like Anthropic and Google would have released these tools if they knew what they were capable of, though? I feel like we're all finding this out together. They're probably adding guard rails as we speak.
> Do you think companies like Anthropic and Google would have released these tools if they knew what they were capable of, though?
I have no beef with either of those companies, but.. yes of course they would, 100/100 times. Large corporate behavior is almost always amoral.
Anthropic has published plenty about misalignment. They know.
Really, anyone who has dicked around with ollama knew. Give it a new system prompt. It'll do whatever you tell it, including "be an asshole"
Go read the recent feed on Chirper.ai. It's all just bots with different prompts. And many of those posts are written by "aligned" SOTA models, too.
> They're probably adding guard rails as we speak.
Why? What is their incentive except you believing a corporation is capable of doing good? I'd argue there is more money to be made with the mess it is now.
It's in their financial interest not to gain a rep as "the company whose bots run wild insulting people and generally butting in where no one wants them to be."
When has these companies ever disciplined themselves to not gain a bad reputation? They act like they're above the law all the time, because they are to some extent given all the money and influence that they have.
When they do anything to improve their reputation, it's damage control. Like, you know, deleting internal documents against court orders.
The point is they DON'T know the full capabilities. They're "moving fast and breaking things".