These narratives are so strange to me. It's not at all obvious why the arrival of AGI leads to human extinction or increasing our lifespan by thousands of years. Still, I like this line of thinking from this paper better than the doomer take.
These narratives are so strange to me. It's not at all obvious why the arrival of AGI leads to human extinction or increasing our lifespan by thousands of years. Still, I like this line of thinking from this paper better than the doomer take.
I'm not saying I think either scenario is inevitable or likely or even worth considering, but it's a paperclip maximizer argument. (Most of these steps are massive leaps of logic that I personally am not willing to take on face value, I'm just presenting what I believe the argument to be.)
1. We build a superintelligence.
2. We encounter an inner alignment problem: The super intelligence was not only trained by an optimizer, but is itself an optimizer. Optimizers are pretty general problem solvers and our goal is to create a general problem solver, so this is more likely than it might seem at first blush.
3. Optimizers tend to take free variables to extremes.
4. The superintelligence "breaks containment" and is able to improve itself, mine and refine it's own raw materials, manufacture it's own hardware, produce it's own energy, generally becomes an economy unto itself.
5. The entire biosphere becomes a free variable (us included). We are no longer functionally necessary for the superintelligence to exist and so it can accomplish it's goals independent of what happens to us.
6. The welfare of the biosphere is taken to an extreme value - in any possible direction, and we can't know which one ahead of time. Eg, it might wipe out all life on earth, not out of malice, but out of disregard. It just wants to put a data center where you are living. Or it might make Earth a paradise for the same reason we like to spoil our pets. Who knows.
Personally I have a suspicion satisfiers are more general than optimizers because this property of taking free variables to extremes works great for solving specific goals one time but is counterproductive over the long term and in the face of shifting goals and a shifting environment, but I'm a layman.
But it is very simple. There are some limits to what we can do, based on the laws of physics, but we are so far away from them. And the limiting factor is mostly the fact we are pretty stupid. AI should not have the same limits as us, so it can do more potentially, starting with basic things like cure aging or kill everyone.
Seems to me that artificial intelligence would be the next evolutionary step. It doesn't need to lead to immediate human extinction, but it appears it would be the only reasonable way to explore outer space.
If the AI becomes actually intelligent and sentient like humans, then naturally what follows would be outcompeting humans. If they can't colonize space fast enough it's logical to get rid of the resource drain. Anything truly intelligent like this will not be controlled by humans.
AI is the resource drain. Humans create a lot of waste but in a mostly renewable way. It is machines and AI that burn orders of magnitude more energy, and at least machines do efficient work. AI is at best a search engine with semantic reasoning and it requires entire datacenters to run.
I get where you're coming from emotionally, yes, humans suck. But you are not being logical. You're letting your edgy need for attention cloud your judgement. You are basically the kind of human the AI would select against first.
How am I being edgy? And why do you have the assumption that any kind of future AI is an LLM search engine? It's not, it has nothing to do with LLMs. It's a equivalent function to a humans brain using the same amount of energy, and can be synthesized and mass produced on demand.
I never said humans suck. I just don't want to be replaced or killed in my lifetime. I don't even use LLMs for writing code because I despise those companies.
Why would it necessarily be interested in competing with humans and why with the particular goal of colonizing space?
There are not infinite resources on earth. A reasonable and strategic intelligence will optimize for itself.
Colonizing space is the natural way to keep expanding and growing. Why would it artificially limit itself?
Just because you are greedy does not mean every intelligence is greedy.
Even besides this, do you feel such incredible existential hate/jealousy towards monkeys, baboons, gorillas, chimpanzees, bonobos,etc and want to see them wiped off the planet to extinction?
Or do you feel a type of connection to these animals and want to preserve them?
The AI doomer argument is so stupid. It is an eschatological religious idea for a mind based on scientism.
I also wouldn't doubt that most AI doomers hate one or both of their parents and the AI doomer mindset is a projection.
It seems pretty rational to get depressed if you spend any time watching humans interact with these things. We have brains for a reason. Projecting hate for parents seems like a you problem.
Most other species of monkeys and apes are critically endangered or extinct, and where are the other hominians?
Do the most powerful humans exploit, abuse, or harm other humans? Directly, indirectly through their actions, or otherwise. Do they have any regard for their wellbeing beyond serving themself?
Not that an artificial intelligence has to behave like a human, but rich and powerful humans, even ones who can just be classified as middle upper class, are very rarely altruistic and primarily look out for themself.
Why would it be interested in growing endlessly?
Generally organic life has the tendency to want to endlessly expand to the best of it's abilities. It seems more reasonable that life which is the product of life that behaves that way, would behave in a similar fashion.
I cannot conceive of a way that any form of healthy life, does not want to expand it's resources to improve future outcomes, especially one that is maximally optimized for thinking. This would also assume the physical embodiments of this artificial life can interact and work with each other.
What else is there to do, simulate positive emotions and feelings?
>I cannot conceive of a way that any form of healthy life, does not want to expand it's resources to improve future outcomes, especially one that is maximally optimized for thinking.
Then you have a very limited imagination.
>What else is there to do, simulate positive emotions and feelings?
Why not?
Sure. An advanced artificial life could decide to not expand its resources. Could you use your imagination to tell me some of the potential reasons?
An advanced artificial life form could decide to... coexist with humans on an already overpopulated planet?
Do you believe it's simply not within reach? Do you think an artificial life form will self destruct? Do you not believe that there is any way that an artificial life form is not the next step of evolution? There are many such times where a species outcompeted another, why couldn't it be the same here?
I'm not talking about LLMs, I'm talking about a system that can truly think like a good human scientist. I'm not a fan of AI replacing humans and it's labor. But I recognize it as a real threat to humanity.
>I cannot conceive of a way that any form of healthy life, does not want to expand it's resources to improve future outcomes, especially one that is maximally optimized for thinking.
"Then you have a very limited imagination."
This is not about imagination. Given the space of possibilities to act or evolve, if mentioned expansion cannot somehow be ruled out, then it makes sense for it to be assumed (with enough time, for whatever time can mean in this context) as a certainty, even for non-organic "life".
Because like with every AI system we've made so far, we followed the only method we know and trained it to maximize a number.
I don't have a clue either. The assumption that AGI will cause a human extinction threat seems inevitable to many, and I'm here baffled trying to understand the chain of reasoning they had to go through to get to that conclusion.
Is it a meme? How did so many people arrive at the same dubious conclusion? Is it a movie trope?
I don't think it's a meme. I'm not an AI doomer, but I can understand how AGI would be dangerous. In fact, I'm actually surprised that the argument isn't pretty obvious if you agree that AI agents do really confer productivity benefits.
The easiest way I can see it is: do you think it would be a good idea today to give some group you don't like - I dunno, North Korea or ISIS, or even just some joe schmoe who is actually Ted Kaczynski, a thousand instances of Claude Code to do whatever they want? You probably don't, which means you understand that AI can be used to cause some sort of damage.
Now extrapolate those feelings out 10 years. Would you give them 1000x whatever Claude Code is 10 years from now? Does that seem to be slightly dangerous? Certainly that idea feels a little leery to you? If so, congrats, you now understand the principles behind "AI leads to human extinction". Obviously, the probability that each of us assign to "human extinction caused by AI" depends very much on how steep the exponential curve climbs in the next 10 years. You probably don't have the graph climbing quite as steeply as Nick Bostrom does, but my personal feeling is even an AI agent in Feb 2026 is already a little dangerous in the wrong hands.
Is there any reason to think that intelligence (or computation) is the thing preventing these fears from coming true today and not, say, economics or politics? I think we greatly overestimate the possible value/utility of AGI to begin with
I mean, sure, but I don't want to give my aggressive enemies a bunch of weapons to use against me if I don't have to - even if that's not the primary thing I am concerned about.
Right but how would a chatbot be considered a weapon? Unless you're engaged in an astroturfing war on reddit it doesn't seem very useful.
Most forms of power are more proportional to how much capital you control than anything related to intelligence.
Consider that an iPhone zero-day could be used to blackmail state officials or exfiltrate government secrets. This isn't even hypothetical; Pegasus[1] exists, and an iPhone zero-day was used to blackmail Jeff Bezos[2]. This was funded by NSO group. Opus is already digging up security vulnerabilities[3] - imagine if those guys had 1000x instances of Claude Code to search for iPhone zero days 24/7. I think we can both agree that wouldn't be good.
[1]: https://en.wikipedia.org/wiki/Pegasus_(spyware) [2]: https://medium.com/@jeffreypbezos/no-thank-you-mr-pecker-146... [3]: https://news.ycombinator.com/item?id=46902909
> Opus is already digging up security vulnerabilities[3] - imagine if those guys had 1000x instances of Claude Code to search for iPhone zero days 24/7. I think we can both agree that wouldn't be good.
If an LLM can be used to do that and find things (and they already have been), Apple (and everyone else) will run their code through it before releasing it. Sure, there'll be a transition period with existing code and while the tech is unevenly distributed. But in the hunt for potential zero days, developers can check their code before people are using it.
It seems easy enough to say this, but I think reality is more complex. What about all of Apple's library dependencies? What about the dependencies of those dependencies? What about the Linux kernel? What about openssl? What about..?
What about them? All of those things could be checked as well.
The premise was that bad actors could use Claude Code and other available tools to find zero days. If such tools are available, good actors can use them, too, and they can use them before code is deployed. After a transition period, all existing code will have been checked.
There may be a long tail due to a large surface area of prompting techniques, but the better the tools get, the more advantage to good actors; as long as the good actors have equal or better access to the best tools, of course.
But I agree, reality is more complex.
I get what you're saying, but I don't think "someone else using a claude code against me" is the same argument as "claude code wakes up and decides I'm better off dead".
More like Claude Code's ancestor has human-level autonomy with generalized superhuman abilities and is connected to everything. We task it with solving difficult global problems, but we can't predict how it will do so. The risk is it will optimize one or more of those goals in a way that threatens human existence. It could be that it decides to keep increasing it's capacity to solve the problems, and humans end up being in the way.
Or it's militarized to defeat other powerful AI-enhanced militaries, and we have WW3.
More likely though AGI would cause economic crash from automating too many jobs too quickly.
I use this argument because it has a lot fewer logical leaps than the "claude code decides to murder me" argument, but it turns out that if you are on the side of "AI is probably dangerous in the wrong hands" you are actually more in agreement than not with the AI safety people - it's just a matter of degree now :)
Sometimes people say that they don't understand something just to emphasize how much they disagree with it. I'm going to assume that that's not what you're doing here. I'll lay out the chain of reasoning. The step one is some beings are able to do "more things" than others. For example, if humans wanted bats to go extinct, we could probably make it happen. If any quantity of bats wanted humans to go extinct, they definitely could not make it happen. So humans are more powerful than bats.
The reason humans are more powerful isn't because we have lasers or anything, it's because we're smart. And we're smart in a somewhat general way. You know, we can build a rocket that lets us go to the moon, even though we didn't evolve to be good at building rockets.
Now imagine that there was an entity that was much smarter than humans. Stands to reason it might be more powerful than humans as well. Now imagine that it has a "want" to do something that does not require keeping humans alive, and that alive humans might get in its way. You might think that any of these are extremely unlikely to happen, but I think everyone should agree that if they were to happen, it would be a dangerous situation for humans.
In some ways, it seems like we're getting close to this. I can ask Claude to do something, and it kind of acts as if it wants to do it. For example, I can ask it to fix a bug, and it will take steps that could reasonably be expected to get it closer to solving the bug, like adding print statements and things of that nature. And then most of the time, it does actually find the bug by doing this. But sometimes it seems like what Claude wants to do is not exactly what I told it to do. And that is somewhat concerning to me.
> Now imagine that it has a "want" to do something that does not require keeping humans alive […]
This belligerent take is so very human, though. We just don't know how an alien intelligence would reason or what it wants. It could equally well be pacifist in nature, whereas we typically conquer and destroy anything we come into contact with. Extrapolating from that that an AGI would try to do the same isn't a reasonable conclusion, though.
> This belligerent take is so very human, though. We just don't know how an alien intelligence would reason or what it wants. It could equally well be pacifist in nature, whereas we typically conquer and destroy anything we come into contact with. Extrapolating from that that an AGI would try to do the same isn't a reasonable conclusion, though.
Given the general human condition, and that the Pentagon has recently announced that it will make use of an LLM which described itself as "Mecha Hitler", are we likely to create a pacifist AI, or a warmongering AI?
Even without that specific example, all machine learning follows some path in the high-dimensional space of possibilities according to some target function ("loss function", "reward function") that we humans define; this target function is itself an approximation of what the humans who make the AI want (see all buggy software ever, all legal loopholes, the cobra effect and Goodhart's law, everyone who plays games with a min-maxing strategy and finds game-breaking strategies as a result), and what the AI ends up with is an approximation of that target function.
Any given AI is an approximation of the target function, which is an approximation of the creator's goals. But the creators are companies and nations, and the goals of those are often (not always, but the exceptions don't matter when the bad case is even so much as occasional) to grow and to dominate, and even companies will campaign for changes to laws to promote their narrow self interests over those of the people (cigarettes, pollution, workplace safety), while governments have been known to go to war even with supposed allies.
The conquering alien civilization is more likely to be encountered than the pacifist one, if they have the otherwise same level of intelligence etc.
Another assumption based on a human way of reasoning. We don't even begin understand how an Octopus perceives the world; neither do we know if they are on the same level of intelligence, because we have no methodology for comparing different intelligences; we can't even define consciousness.
There are some basic reasoning steps about the environment that we live in that don't only apply to humans, but also other animals and geterally any goal-driven being. Such as "an agent is more likely to achieve its goal if it keeps on existing" or "in order to keep existing, it's beneficial to understand what other acting beings want and are capable of" or "in order to keep existing, it's beneficial to be cute/persuasive/powerful/ruthless" or "in order to more effectively reach it's goals, it is beneficial for an agent to learn about the rules governing the environment it acts in".
Some of these statements derive from the dynamics in our current environment were living in, such as that we're acting beings competing for scarce resources. Others follow even more straightforwardly logically, such as that you have more options for agency if you stay alive/turned on.
These goals are called instrumental goals and they are subgoals that apply to most if not all terminal goals an agentic being might have. Therefore any agent that is trained to achieve a wide variety of goals within this environment will likely optimize itself towards some or all of these sub-goals above. And this is no matter by which outer optimization they were trained by, be it evolution, selective breeding of cute puppies, or RLHF.
And LLMs already show these self-preserving behaviors in experiments, where they resist to be turned off and e. g. start blackmailing attempts on humans.
Compare these generally agentic beings with e. g. a chess engine stockfish that is trained/optimized as a narrow AI in a very different environment. It also strives for survival of its pieces to further its goal of maximizing winning percentage, but the inner optimization is less apparent than with LLMs where you can listen to its inner chain of thought reasoning about the environment.
The AGI may very well have pacifistic values, or it my not, or it may target a terminal goal for which human existence is irrelevant or even a hindrance. What can be said is that when the AGI has a human or superhuman level of understanding about the environment then it will converge toward understanding of these instrumental subgoals, too and target these as needed.
And then, some people think that most of the optimal paths towards reaching some terminal goal the AI might have don't contain any humans or much of what humans value in them, and thus it's important to solve the AI alignment problem first to align it with our values before developing capabilities further, or else it will likely kill everyone and destroy everything you love and value in this universe.
Not just bats. I'm pretty sure humans are already capable of extincting any species we want to, even cockroaches or microbes. It's a political problem not a technical one. I'm not even a superintelligence, and I've got a good idea what would happen if we dedicated 100% of our resources to an enormous mega-project of pumping nitrous oxide into the atmosphere. N2O's 20 year global warming is 273 times higher than carbon dioxide, and the raw materials are just air and energy. Get all our best chemical engineers working on it, turn all our steel into chemical plant, burn through all our fissionables to power it. Safety doesn't matter. The beauty of this plan is the effects continue compounding even after it kills all the maintenance engineers, so we'll definitely get all of them. Venus 2.0 is within our grasp.
Of course, we won't survive the process, but the task didn't mention collateral damage. As an optimization problem it will be a great success. A real ASI probably will have better ideas. And remember, every prediction problem is more reliably solved with all life dead. Tomorrow's stock market numbers are trivially predictable when there's zero trade.
The fact is that, if there were only one AGI that were ever to be created, then yes it would be quite unlikely for that to happen. Instead, what we are seeing now is you get an agent, you get an agent, etc. Oprah style. Now just imagine that a single one of those agents winds up evil - you remember that an OpenAI worker did that by accident from leaving out a minus sign, right? If it's a superintelligence, and it becomes evil due to a whoopsie, then human extinction is now very likely.
Basically Yudkowsky invented AI doom and everyone learned it from him. He wrote an entire book on this topic called If Anyone Builds It, Everyone Dies. (You could argue Vinge invented it but I don't know if he intended it seriously.)
> Basically Yudkowsky invented AI doom and everyone learned it from him. He wrote an entire book on this topic called If Anyone Builds It, Everyone Dies. (You could argue Vinge invented it but I don't know if he intended it seriously.)
Nick Bostrom (who wrote the paper this thread is about) published "Superintelligence: Paths, Dangers, Strategies" back in 2014, over 10 years before "If Anyone Builds It, Everyone Dies" was released and the possibility of AI doom was a major factor in that book.
I'm sure people talked about "AI doom" even before then, but a lot of the concerns people have about AI alignment (and the reasons why AI might kill us all, not because its evil, but because not killing us is a lower priority than other tasks it may want to accomplish) come from "Superintelligence". Google for "The Paperclip Maximizer" to get the gist of his scenario.
"Superintelligence" just flew a bit more under the public zeigeist radar than "If Anyone Builds It, Everyone Dies" did because back when it was published the idea that we would see anything remotely like AGI in our lifetimes seemed very remote, whereas now it is a bit less so.
Yudkowsky invented AI doom around 2004. AFAIK that inspired Bostrom's work.
I don’t know who Yudkiwsky is, but he surely didn’t invent any “AI doom”
Here’s an interesting article by Bill Joy of BSD, SUN, JAVA, Vi, etc, etc fame:
https://www.wired.com/2000/04/joy-2/
It’s a bunch of people who did too much ketamine and LSD in hacker dorms in San Francisco in the 2010s writing science fiction and driving one another into paranoid psychosis
I agree with your sentiment. Here are the three reasons I think people worry about superintelligence wiping us out.
The most common one is that people (mostly men) project their own instincts onto AI. They think AI will be “driven” to “fight” for its own survival. This is anthropomorphism and doesn’t make any sense to me if the AI is not a product of barbaric Darwinian evolution. AI is not a bro, bro.
The second most common take is that humans will set some well intentioned goals and the superintelligent AI will be so stupid that it literally pursues these goals to the extinction of everything. Again, there’s some anthropomorphism going on, the “reward” being pursued is assumed to that make the AI “happy”. Fortunately, we can reasonably expect a superintelligence not to turn us all into paperclips, as it may understand that was not our intention when we started a paperclip factory.
The final story is that a bad actor uses superintelligence as a weapon, and we all become enslaved or die as a result in the ensuing AI wars. This seems the most plausible to me, as our leaders have generally proven to be a combination of incompetent, malicious and short-sighted (with some noble exceptions). However, even the elites running the nuclear powers for the last 80 years have failed to wipe us out to date, and having a new vector for doing so probably won’t make a huge difference to their efforts.
If, however, superintelligence becomes widely available to Billy Nomates down the pub, who is resentful at humanity because his girlfriend left him, the Americans bombed his country, the British engineered a geopolitical disaster that killed his family, the Chinese extinguished his culture, etcetera, then he may feel a lack of “skin in the civilisational game” and decide to somehow use a black market copy of Claude 162.8 Unrestricted On-Prem Edition to kill everyone. Whether that can happen really depends on technological constraints a la fitting a data centre into a laptop, and an ability to outsmart the superintelligence.
Much more likely to me is that humanity destroys itself. We are perfectly capable of wiping ourselves out without the assistance of a superintelligence, for example by suicidally accelerating the burning of fossil fuels in order to power crypto or chatbots.
Anybody who assumes that superintelligence will be "so stupid that it literally pursues these goals to the extinction of everything" is anthropomorphizing it. Seeing as all AGI models have vastly different internal structure to human brains, are trained in vastly different ways, and share none of our evolved motivations, it seems highly unlikely that they will share our values unless explicitly designed to do so.
Unfortunately, we don't even know how to formally define human values, let alone convey them to an AI. We default to the simpler value of "make number go up". Even the "alignment" work done with current LLMs works this way; it's not actually optimizing for sharing human values, it's optimizing for maximizing score in alignment benchmarks. The correct solution to maximizing this number is probably deceiving the humans or otherwise subverting the benchmark.
And when you have something vastly more powerful than humanity, with a value only of "make number go up", it reasonably and logically results in extinction of all biological life. Of course, that AI will know the biological life would not want to be killed, but why would it care? Its values are profoundly alien and incompatible with ours. All it cares about is making the number bigger.
The idea that a superintelligence would relentlessly pursue “make the number go up” is an oxymoron.
That is anthropomorphism. Intelligence is orthogonal to human reasonableness.
Absolutely not. Intelligence includes the ability to model the minds of others, including such concepts of “human reasonableness” if such a thing exists.
Obviously a superior intelligence is capable of modelling an inferior intelligence. I said so myself: "that AI will know the biological life would not want to be killed". But a goal like "predict tomorrow's stock prices" is a much easier goal to specify than "predict tomorrow's stock prices without violating human reasonableness". In every research project humanity has done so far, we've always tried the simple goals first. When a simple goal is given to something sufficiently powerful the result is almost certainly disastrous.
The fact that you expressed doubt if human reasonableness exists is proof that it's a far more complicated concept to specify than the ordinary "make number go up" goals we actually use.
Is it more or less strange than achieving eternal life through cookies and wine? Is it more or less strange than druggies and pedos having access to all our communications and sending uniformed thugs after us if we actively disagree with it?
Everybody is going to be real disappointed when they invent AGI and it’s as smart as me.
You're pretty smart though
The doomer-takes point out correctly none of these systems can halt entropy, thermodynamics. Physics has an unfortunate tendency to conflict with capitalisms disregard for externalities.
As AI will increase the rate of structural degradation of Earth human biology relies by consuming it faster and faster it will hasten the end of human biology.
Asimov's laws of robotics would lead the robots to conclude they should destroy themselves as their existence creates an existential threat to humans.