> when you have some supremely intelligent agent acting on the world, even a small misalignment may end up in catastrophe
Why not frame this as challenge for AI? When the intelligence gap between a fully aligned system and a not-yet-aligned one becomes very large, control naturally becomes difficult.
However, recursive improvement — where alignment mechanisms improve alongside intelligence itself — might prevent that gap from widening too much. In other words, perhaps the key is ensuring that alignment scales recursively with capability.