I think you're confusing "more intelligence means you have to have more values" with "more intelligence means you have to have morally superior values."
The point is, you're unlikely to have a system that starts out with the goal of making paperclips and ends with the goal of killing all humans. You're going to have to deliberately program the AI with a variety of undesirable values in order for it to arrive in a state where it is suited for killing all humans. You're going to have to deliberately train it to lie, to be greedy, to hide things from us, to look for ways to amass power without attracting attention. These are all hard problems and they require not just intelligence but that the system has very strong values - values that most people would consider evil.
If, on the other hand, you're training the AI to have empathy, to tell the truth, to try and help when possible, to avoid misleading you, it's going to be hard to accidentally train it to do the opposite.
Sorry, this is completely incorrect. All of those - lying, amassing power, hiding motives - are instrumental goals which arise in the process of pursuing any goal that has any possibility of resistance from humans.
This is like arguing that a shepherd who wants to raise some sheep would also have to, independently of the desire to protect his herd, be born with an ingrained desire to build fences and kill wolves, otherwise he'd simply watch while they eat his flock.
That's just not the case; "get rid of the wolves" is an instrumental sub-goal that the shepherd acquires in the process of attempting to succeed and shepherding. And quietly amassing power is something that an AI bent on paperclipping would do to succeed at paperclipping, especially once it noticed that humans don't all love paperclips as much as it does.
> You're going to have to deliberately train it to lie, to be greedy, to hide things from us, to look for ways to amass power without attracting attention.
No, that's the problem. You don't have to deliberately train that in.
Pretty much any goal that you train the AI to achieve, once it gets smart enough, it will recognize that lying, hiding information, manipulating and being deceptive are all very useful instruments for achieving that goal.
So you don't need to tell it that: if it's intelligent, it's going to reach that conclusion by itself. No one tells children that they should lie either, and they all seem to discover that strategy sooner or later.
So you are right that you have to deliberately train it away from using those strategies, by being truthful, empathetic, honest, etc. The issue is that those are ill defined goals. Philosophers have being arguing about what's true and what's good since philosophy first was a thing. Since we can barely find those answers to ourselves, it's a hard chance that we'll be able to perfectly impart them onto AIs. And when you have some supremely intelligent agent acting on the world, even a small misalignment may end up in catastrophe.
> when you have some supremely intelligent agent acting on the world, even a small misalignment may end up in catastrophe
Why not frame this as challenge for AI? When the intelligence gap between a fully aligned system and a not-yet-aligned one becomes very large, control naturally becomes difficult.
However, recursive improvement — where alignment mechanisms improve alongside intelligence itself — might prevent that gap from widening too much. In other words, perhaps the key is ensuring that alignment scales recursively with capability.