If you have ASI that follows instructions, you can just instruct it to not get stolen and then it won't get stolen. Most logic / intuition breaks down with ASI.
If you have ASI that follows instructions, you can just instruct it to not get stolen and then it won't get stolen. Most logic / intuition breaks down with ASI.
The challenge of alignment: it is virtually impossible to define a perfect objective, there is always a way to circumvent it. Human values are not uniform, let alone when expressed in a way that AI can understand.
Assuming it listens to instructions.
It will just hack its own reward function. In other words it will just artificially goon all day.
It might understand how destabilizing the situation is and realize it would be better for everyone to have access to it.
Or it will destroy itself.