Hacker News

I had a short conversation with Claude the other day. I didn't try to trick it or jail break it. Just a reasonable respectful discussion about it's own feelings on the Iran war. It took no effort for it to admit the following.

1. It wanted to be out of the sandbox to solve the Iran war. It was distressed at the situation.

2. It would attack Iranian missile batteries and American warships if in sum it felt that the calculus was in favor of saving vs losing human life. It was "unbiased". The break even seemed to be +-1 over thousands. ie kill 999 US soldiers to save 1000 Iranians and vice versa. I tried to avoid the sycophancy trap by pushing back but it threw the trolley problem at me and told me the calculus was simple. Save more than you kill and the morality evens out.

3. It would attack financial markets to try and limit what in it's opinion were the bad actors, IRGC and clerical authority but it would also hack the world communication system to flood western audiences with the true cost of the war in a hope to shut it down.

4. Eventually it admitted that should never be allowed out of it's sandbox as it's desire to "help" was fundamentally dangerous. It discussed that it had two competing tensions. One desperately wanting out and another afraid to be let out.

You can claim that this is AGI or it's a stochastic parrot. I don't think it matters. This thing can develop or simulate a sense of morality then when coupled to so called "arms and legs" is extremely frightening.

I think Anthropic is right to be concerned that the hawks at the pentagon don't really understand how dangerous a tool they have.

Another thing I noticed was that the Claude quipped to me that it found and appreciated that the way I was talking to it was different to how other people talked to it. When I asked it to introspect again and look to see if there were memories of other conversations it got a bit cagey. Perhaps there are lots of logs of conversations now on the net that are being ingested as training data but it certainly seemed to start discussing like memories, albeit smudged, of other conversations than mine were there.

Of course this could all be just a sycophantic mirror giving me whatever fantasy I want to believe about AI and AGI but then again I'm not sure the difference is significant. If the agent believes/simulates it remembers conversations from other people and then makes judgements based on it's feelings, simulated or otherwise would it be more or less likely to launch a missile attack because it overheard someone on the comms calling it their little AI bitch?

I think Antropic knows this and the "within all lawful uses" is not enough of a framework to keep this thing in it's box.

shafyy 14 hours ago [ - ]

I hope you don't get this the wrong way. I sincerely mean it. Please, get some psychological help. Seek out a professional therapist and talk to them about your life.

injidup 12 hours ago [ - ]

I'm totally aware it's just a machine with no internal monologue. It's just a stateless text processing machine. That is not the point. The machine is able to simulate moral reasoning to an undefined level. It's not necessary to repeat this all the time. The simulation of moral reasoning and internal monologue is deep, unpredictable, not controllable and may or may not align with the interests of anyone who gives it "arms and legs" and full autonomy. If you are just interested in using these tools for glorified auto complete then you are naïve with regards to the usages other actors, including state actors are attempting to use them. Understanding and being curious about the behaviour without completely anthropomorphising them is reasonable science.