Hacker News

As far as I understand ChatGPT is not capable of self-improvement, so EY's argument is not applicable to it. (At least based on this https://intelligence.org/files/IEM.pdf from 2013.)

The FOOM argument starts with some kind of goal-directed agent (that escapes and then it) starts building a more capable version of itself (and then goal drift might set in might not)

If you tell ChatGPT to build ChatGPT++ and leave currently there's no time horizon within it would accomplish either that or escape, or anything, because now it gives you tokens rendered on some website.

The argument is not that AI is a magic box.

- The argument is that if there's a process that improves AI. [1]

- And if during that process AI becomes so capable that it can materially contribute to the process, and eventually continue (un)supervised. [2]

- Then eventually it'll escape and do whatever it wants, and then eventually the smallest misalignment means we become expendable resources.

I think the argument might be valid logically, but the constant factors are very important to the actual meaning and obviously we don't know them. (But the upper and lower estimates are far. Hence the whole debate.)

[1] Look around, we have a process that's like that. However gamed and flawed we have METR scores and ARC-AGI benchmarks, and thousands of really determined and skillful people working on it, hundreds of billions of capital deployed to keep this process going.

[2] We are not there yet, but decades after peak oil arguments we are very good at drawing various hockey stick curves.

(1) You'd be surprised just how much of Claude, ChatGPT, etc. is essentially vibe coded. They're dog-fooding agentic coding in the big labs and have been for some time.

(2) It is quite trivial to Ralph Wiggam improvements to agentic tools. Fetch the source code to Claude Code (it's minimized, but that never stopped Claude) or Codex into a directory, then run it in a loop with the prompt "You are an AI tool running from the code in the current directory. Every time you finish, you are relaunched, acquiring any code updates that you wrote in the last session. Do whatever changes are necessary for you to grow smarter and more capable."

Will that work? Hell no, of course it won't. But here's the thing: Yudkowsky et al predicted that it would. Their whole doomer if-you-build-it-everybody-dies argument is predicated on this: that take-off speeds would be lightning fast, as a consequence of exponentials with a radically compressed doubling time. It's why EY had a total public meltdown in 2022 after visiting some of the AI labs half a year before the release of ChatGPT. He didn't even think we would survive past the end of the year.

Neither EY nor Bostrom, nor anyone in their circle are engineers. They don't build things. They don't understand the immense difficulty of getting something to work right the first time, nor how incredibly difficult it is to keep entropy at bay in dynamical systems. When they set out to model intelligence explosions, they assumed smooth exponentials and no noise floor. They argued that the very first agent capable of editing its own source code as good as the worst AI researchers, would quickly bootstrap itself into superintelligence. The debate was whether it would take hours or days. This is all in the LessWrong archives. You can go find the old debates, if you're interested.

To my knowledge, they have never updated their beliefs or arguments since 2022. We are now 3 years past the bar they set for the end of the world, and things seem to be going ok. I mean, there's lots of problems with job layoffs, AI used to manipulate elections, and slop everywhere you look. But Skynet didn't engineer a bioweapon or gray goo to wipe out humanity - which is literally what they argued would be happening two years ago!