I've listed to basically every argument Elizer has verbalized, across many podcast interviews and youtube videos. I also made it maybe an hour into the audiobook of Everyone Dies.

Roughly speaking, every single conversation with Elizer you can find takes the form: Elizer: "We're all going to die, tell me why I'm wrong." Interviewer: "What about this?" Elizer: "Wrong. This is why I'm still right." (two hours later) Interviewer: "Well, I'm out of ideas, I guess you're right and we're all dead."

My hope going into the book was that I'd get to hear a first-principals argument for why these things silicon valley is inventing right now are even capable of killing us. I had to turn the book off, because if you can believe it despite it being a conversation with itself, it still follows this pattern of presuming LLMs will kill us, then arguing from the negative.

Additionally, while I'm happy to be corrected about this: I believe that Elizer's position is characterizable as: LLMs might be capable of killing everyone, even independent of a bad-actor "houses don't kill people, people kill people" situation. In plain terms: LLMs are a tool, all tools empower humans, humans can be evil, so humans might use LLMs to kill each other; but we can remove these scenarios from our Death Matrix because these are known and accepted scenarios. Even with these scenarios removed, there are still scenarios left in the Death Matrix where LLMs are the core responsible party to humanity's complete destruction. "Terminator Scenarios" alongside "Autonomous Paperclip Maximizer Scenarios" among others that we cannot even imagine (don't mention paperclip maximizers to Elizer though, because then he'll speak for 15 minutes on why he regrets that analogy)

Why would you think Eliezer's argument, which he's been articulating since the late 2000s or even earlier, is specifically about Large Language Models?

It's about Artificial General Intelligences, which don't exist yet. The reason LLMs are relevant is because if you tried to raise money to build an AGI in 2010, only eccentrics would fund you and you'd be lucky to get $10M, whereas now LLMs have investors handing out $100B or more. That money is bending a generation of talented people into exploring the space of AI designs, many with an explicit goal of finding an architecture that leads to AGI. It may be based on transformers like LLMs, it may not, but either way, Eliezer wants to remind these people that if anyone builds it, everyone dies.

Artificial General Intelligence, as classically defined by Yud and Bostrom, was invented in 2022.

They didn't coin the term, there is nothing "classical" about their interpretation of the terminology.

FWIW, Eliezer has been making these arguments decades before the appearance of LLMs. It isn’t clear to me that LLMs are evidence either for or against Eliezer’s arguments.

Sorry, yeah, replace every time I say "LLM" with "AI".

I've forced myself into the habit of always saying "LLM" instead of "AI" because people (cough Elizer) often hide behind the nebulous, poorly defined term "AI" to mean "magic man in a computer that can do anything." Deploying the term "LLM" can sometimes force the brain back into a place of thinking about the actual steps that get us from A to B to C, instead of replacing "B" with "magic man".

However, in Elizer's case; he only ever operates in the "magic man inside a computer" space, and near-categorically refuses to engage with any discussion about the real world. He loves his perfect spheres on a friction-less plane, so I should use the terminology he loves: AI.

If you want a first principles approach, I recommend Rob Miles' videos on YouTube. He has been featured many times in the Computerphile channel, and has a channel of his own as well.

Most of the videos take a form of:

1. Presenting a possible problem that AIs might have (say, lying during training, or trying to stop you from changing their code) 2. Explaining why it's logical to expect those problems to arise naturally, without a malicious actor explicitly trying to get the AI to act badly 3. Going through the proposed safety measures we've come up so far that could mitigate that problem 4. Showing the problems with each of those measures, and why they are wholly or at least partially ineffective

I find he's very good a presenting this in an approachable and intuitive way. He seldom makes direct those bombastic "everyone will die" claims, and instead focuses on just showing how hard it is to make an AI actually aligned with what you want it to do, and how hard it can be to fix that once it is sufficiently intelligent and out in the world.

I think all those are fair points, and Elizer says much of the same. But, again: none of this explains why any of those things happening, even at scale, might lead to the complete destruction of mankind. What you're describing is buggy software, which we already have.

Right, but so far we do not have buggy software that is more intelligent (and therefore more effective at accomplishing its goals) than humans are. Literally the argument boils down to "superhuman effectiveness plus buggy goals equals very bad outcomes", and the badness scales with both effectiveness and bugginess.

> so far we do not have buggy software that is more intelligent (and therefore more effective at accomplishing its goals) than humans are.

Of course we do! In fact, most, if not all, software is more intelligent than humans, by some reasonable definition of intelligence [1] (you could also contrive a definition of intelligence for which this is not true, but I think that's getting too far into semantics). The Windows calculator app is more intelligent and faster at multiplying large numbers together [2] than any human. JP Morgan Chase's existing internal accounting software is more intelligent and faster than any human at moving money around; so much so that it did, in any way that matters, replace human laborers in the past. Most software we build is more intelligent and faster than humans at accomplishing the goal the software sets itself at accomplishing. Otherwise why would we build it?

[1] Rob Miles uses ~this definition of intelligence: if an agent is defined as an entity making decisions toward some goal, Intelligence is the capability of that agent to make correct decisions such that the goal is most effectively optimized. The Windows Calculator App makes decisions (branches, MUL ops, etc) in pursuit of its goal (to multiply two numbers together); oftentimes quite effectively and thus with very high domain-limited intelligence [2] (possibly even more effectively and thus more intelligently than LLMs). A buggy, less intelligent calculator might make the wrong decisions on this path (oops, we did an ADD instead of a MUL).

[2] What both Altman and Yudkowsky might argue as a critical differentiation here is that traditional software systems naturally limit their intelligence to a particular domain; whereas LLMs are Generally Intelligent. The discussion approaches the metaphysical when you start asking questions like: The Windows Calculator can absolutely, undeniably, multiply two numbers together better than ChatGPT; and by a reasonable definition of intelligence, this makes the Windows Calculator more intelligent than ChatGPT at multiplying two numbers together. Its definitely inaccurate to say that the Windows Calculator is more intelligent, generally, than ChatGPT. Is it not also inaccurate to state that ChatGPT is generally more intelligent than the Windows Calculator? After all, we have a clear, well-defined domain of intelligence along-which the Windows Calculator outperforms ChatGPT. I don't know. It gets weird.

Of course, there are different domains of intelligence, and agent A can be more intelligent in domain X while agent B is more intelligent in domain Y.

If you want to make some comparison of general intelligence, you have to start thinking of some weighted average of all possible domains.

One possible shortcut here is the meta domain of tool use. ChatGPT could theoretically make more use of a calculator (say, via always calling a calculator API when it wants to do math, instead of trying to do it by itself) than a calculator can make use of ChatGPT, so that makes ChatGPT by definition smarter than a calculator, cause it can achieve the same goals the calculator can just by using it, and more.

That's really most of humans' intelligence edge for now: seems like more and more, for any given skill, there's a machine or a program that can do it better than any human ever could. Where humans excel is our ability to employ those super human tools in the aid of achieving regular human goals. So when some AI system gets super-human-ly good at using tools which are better than itself in particular domains for its own goals, I think that's when things are going to get really weird.

I don't know if this matters to you, but Eliezer doesn't think LLMs will kill us. He thinks LLMs are a stepping stone to the ASI that will kill us.

If you’re actually curious, not just venting, the book you want is Superintelligence by Nick Bostrom.

Not to claim that it is in any way correct! I’m a huge critic of Bostrom and Yud. But that’s the book with the argument that you are looking for.