> The fact that we can meaningfully influence their behavior through training hints that value learning is tractable

I’m at a loss for words. I don understand how someone who seemingly understands these systems can draw such a conclusion. They will do what they’re trained to do; that’s what training an ML model does.

AI trained and built to gather information, reason about it, and act on its conclusions is not too different from what animals / humans do - and even brilliant minds can fall into self-destructive patterns like nihilism, depression, or despair.

So even if there’s no “malfunction”, a feedback loop of constant analysis and reflection could still lead to unpredictable - and potentially catastrophic - outcomes. In a way, the Fermi Paradox might hint at this: it is possible that very intelligent systems, biological or artificial, tend to self-destruct once they reach a certain level of awareness.

[deleted]