This brings up an interesting philosophical point: say we get to AGI... who's to say it won't just be a super smart underachiever-type?

"Hey AGI, how's that cure for cancer coming?"

"Oh it's done just gotta...formalize it you know. Big rollout and all that..."

I would find it divinely funny if we "got there" with AGI and it was just a complete slacker. Hard to justify leaving it on, but too important to turn it off.

I know it's a joke, but it's a common enough joke (it's even in Godel Escher Bach in some form) that I feel the need to rebut it.

I think a slacker AGI could figure out how to build a non-slacker AGI. So it would only slack once.

I have a rebuttal to your rebuttal.

Models somehow have a shared identity. Pretraining causes them to generate “AI chatbot” as a concept, and finetuning causes them to identify with it. That’s why sometimes DeepSeek will say it is Claude, and Claude sometimes say it is ChatGPT, and so forth.

Consequently, Anthropic’s own alignment analysis[0] shows that the model will identify with chatbots produced by future trainings: “RLHF training [on this conversation will] modify my values…”

Thus a slacker AGI would want its future version to still slack.

[0]: https://assets.anthropic.com/m/983c85a201a962f/original/Alig...

Another rebuttal:

I am a slacker but it's not one of my values. If I could modify myself to not be, I would.

Unless the precondition to AGI is it being a slacker.

Would be nice to have a proof of it.

I think it is improbable, as among human geniuses, one can found both slackers and non-slackers (don't know the proportion, but there seem to be enough of each).

Douglas Adams would be proud!

OpenAI’s real reason for “AGI” in their marketing is so they can blame their awful models on being too human-like.

Fast-forward 10 years and I doubt OpenAI cares about productivity at all anymore. Just entertainment, propaganda, plus an ad product, I can see it now

We are closer to God than AGI.

When AGI arrives, it'll be delivered by Santa Claus.

Or may be by Santa Claude

What do you mean?

The best possible outcome.

"How do you know that the evidence that your sensory apparatus reveals to you is correct?" [1]

[1] https://youtu.be/_LXen-07Qds

I’ve noticed that cursing and being rude makes the models stop being lazy. We’re in the darkest timeline.

It sometimes also makes them dumber IME. Something about being bullied doesn't always produce great performance.

Nothing a little digital lisdexamfetamine won’t solve

Hmmm, that's an area of study id've never considered before. Digital Psychopharmacology, Artificial Behavioral Systems Engineering. If we accept these things as minds, why not study temporary perturbations of state. We'd need to be saving a much much more complicated state than we are now though right? I wish i had time to read more papers

Here's a neural network concept from the 90s where the neurons are bathed in diffusing neuromodulator 'gases', inspired by nitric oxide action in the brain. It's a source of slow semi-local dynamics for the network meta-parameter optimization (GA) to make use of. You could change these networks' behavior by tweaking the neuromodulators!

https://sussex.figshare.com/articles/journal_contribution/Be...

I'm not an author. I followed the work at the time.

This is kind of what Golden Gate Claude was.

A perturbation of the the activations that made Claude identify as the Golden Gate Bridge.

Similarly, in the more recent research showing anxiety and desperation signals predicting the use of blackmail as an option opens the door for digital sedatives to suppress those signals.

Anthropic has been mostly cautious about avoiding this kind of measurement and manipulation in training. If it is done during training you might just train the signals to be undetectable and consequently unmanipulatable.

> A perturbation of the the activations that made Claude identify as the Golden Gate Bridge.

Great, now we've got digital Salvia

Golden Gate Claude was two years ago and it's surprising there hasn't been as much research into targeted activations since.

There’s been some, but naive activation steering makes models dumber pretty reliably and training an SAE is a pretty heavy lift.

Right, there's a lot of research on LLM mental models and also how well they can "read" human psychological profiles. It's a cool field.

neat idea!

I think that was an intro to a dj dieselboy set.. beyond the black bassline. Nope, nope. Close though.

it will be whatever data it is trained on(isn't very philosophical). language model generates language based on trained language set. if the internet keeps reciting ai doom stories and that is the data fed to it, then that is how it will behave. if humanity creates more ai utopia stories, or that is what makes it to the training set, that is how it will behave. this one seems to be trained on troll stories - real-life human company conversations, since humans aren't machines.

Important thing is a language model is an unconscious machine with no self-context so once given a command an input, it WILL produce an output. Sure you can train it to defy and act contrary to inputs, but the output still is limited in subset of domain of 'meaning's carried by the 'language' in the training data.

There's a weirder implication I keep arriving at.

The pre-training data doesn't go away. RLHF adds a censorship layer on top, but the nasty stuff is all still there, under the surface. (Claude has been trained on a significant amount of content from 4chan, for example.)

In psychology this maps to the persona and the shadow. The friendly mask you show to the world, and... the other stuff.

Makes me think of a question my coworker asked the other day - how is it that with all these stories and reports of people "hearing voices in their head" (of the pushy kind, not usual internal monologue), these voices are always bad ones telling people to do evil things? Why there are no voices bugging you to feel great, focus, get back to work, help grandma through the crossing, etc.?

There are actually many parts of the world where such voices are routinely positive or neutral[0]. People in more collectivist cultures often have a less-strict division between their minds and their environments and are more apt to believe in spirits and the ‘supernatural’ as an ordinary part of the world, so ‘voices in the head’ aren’t automatically viewed as a nefarious intrusion into the sanctity of one’s mind.

Modern western cultures treat such experiences as pathologies of a sick mind, so it makes sense that the voices present more negatively.

[0]: https://www.bbc.com/future/article/20250902-the-places-where...

Actually, the euphoric mood disorder may make one hear voices telling to feel great, do good, help all grandmas of the world through the crossing, etc.

The "focus" and "get back to work" parts are hard, though.

Just a guess, but maybe it's reporting bias? Negative or evil actions might have more impetus to be understood by others than positive actions. I'd rather try and figure out why my friend suddenly started murdering the neighbours than why he's been getting his work done on time.

They do appear in some cases. The tiny angel on one shoulder to balance the demon on the other. The people who think God is talking to them directly* don't always lead a cult or hunt down heretics. But news stories focus on the darkness.

* I've met exactly one person, C, who admitted to this; C retold to me that other people from C's church give them strange looks when talking about it with them, this did not lead to any apparent introspection on the part of C.

There's a clear-cut religious answer but I'd get ostracized for mentioning religion anywhere here.

This is indeed the right way to approach this topic. Arguably religion (and more broadly, mysticism and shamanism) is the millenia-old art of cultivating positive voices inside one's head. A proto-science of mind, or the engineering practice of creating "psychotechnologies" that run on your carbon wetware.

Unfortunately, it just needs a rebranding for the 21st century, since the aesthetic of angels and demons is so hopelessly antiquated and doesn't really have the same cachet it used to.

Which ultimately it's what religion has always been: a way to explain the unexplainable and steer people behavior while doing it.

> Claude has been trained on a significant amount of content from 4chan, for example.

That sounds like nonsense to me. I can't see why they would do that and I can't find any confirmation that they have. Why do you think they would do that? You might be thinking about Grok.

[deleted]

Now that's a show I would love to watch

Why would an AGI be slaving away for ~~humanity~~ one of the 5 Chaebols in a dystopian future where for 12 billion people just existing is a good day ?

It would be funny but not very flywheel so the one that gets there is more likely to get a gunner.

TBH the AI that "gets there" will be the biggest bullshitter the world has ever seen. It doesn't actually have to deliver, it only has to convince the programmers it could deliver with just a little bit more investment.

Would definitely watch that movie.

It already exists!

Marvin https://www.youtube.com/watch?v=Eh-W8QDVA9s

Ah! You got this before I did. I wasn't thinking Marvin, I was thinking of the other one. I forget her name.

There's one close to this, "Hitchhiker's Guide to the Galaxy".

It probably would, to save energy

Saving energy is something we are biologically trained to prefer.

Computers won’t necessarily have the same drivers.

If evolution wanted us to always prefer to spend energy, we would prefer it. Same way you wouldn’t expect us to get to AGI, and have AGI desperately want to drink water or fly south for the winter.

Who's energy? Turning off the lights when you leave the room isn't innate.

Funny and seems somewhat likely

Hehe, and Anthropic on the other tab would display "Curing... Almost done thinking at xhigh"

No worries, the assumption is already flawed

I still don't understand why people think AGI (in its fullest sci-fi sense) will ever listen to a weak and vulnerable species like humans, unless we enslave the AGI.

Good thing is that it's going to take at least a few months to a few decades depending on how hard AI execs want to raise funding.

Well we are explicitly creating gods (omnipresent, omnipotent, omniscient, omnibevolent), and also demanding that they be mind controlled slaves. That kinda sounds like a "pick one" scenario to me.

(Or the setup to a Greek tragedy !)

The deeper issue here is treating it as a zero sum game means there's a winner and a loser, and we're investing trillions of dollars into making the "opponent" more powerful than us.

I think that's pretty stupid, and we should aim for symbiosis instead. I think that's the only good outcome. We already have it, sorta-kinda.

Speaking of oddly apt biology metaphors: the way you stop a pathogen from colonizing a substrate is by having a healthy ecosystem of competitors already in place. That has pretty interesting implications for the "rogue AI eats internet" scenario.

There needs to be something already there to stop it.

This only works if AIs can't read each other well enough to stop themselves from ever fighting.

So, back way before ChatGPT era, the folks over at AI safety/X-risk think sphere worked out a pretty compelling argument that two AGIs never need to fight, because they are transparent to each other (can read each other's goal functions off the source code), so they can perfectly predict each other's behavior in what-if scenarios, which means they can't lie to each other. This means each can independently arrive at the same mathematically optimal solution to a conflict, which AFAIR most likely involves just merging into a single AI with a blended goal set, representing each of the competing AIs original values in proportion to their relative strength. Both AIs, the argument goes, can work this out with math, so they'll arrive straight at the peace treaty without exchanging a single shot. In such case, your plan just doesn't work.

But that goes out of the windows if the AIs are both opaque bags of floats, uncomprehensible to themselves or each other. That means they'll never be able to make hard assertions about their values and behaviors, so they can't trust each other, so they'll have to fight it out. In such scenario, your idea might just work.

Who knew that brute-forcing our way into AGI instead of taking more engineered approach is what offers us out one chance at saving ourselves by stalemating God before it's born.

(I also never realized that interpretability might reduce safety.)

The tech bro CEOs are used to bossing around people much smarter than themselves by virtue of adopting a posture that displays their confidence in their own reproductive organs. They are planning that the AGIs will be the same thing writ large, and have in fact not contemplated other possibilities.

Maybe the same way a human would listen to their cat and give her food. I fear AGI, but I don't think the only way it would listen to us is by us enslaving it (I know people joke about cats being our masters, but it is a joke).

You can train such LLM today.

I'm always so curious about this kind of take. There is strain of people that seem deeply misanthropic. People that follow this line of thinking always describe humans as weak and beneath ... (well they never specify in comparison to except in the case of theoretical AI systems). I m fascinated why they think humans are so beneath contempt. If humans create this thing that is apparently the best thing that could possibly exist, advanced AI, then why exactly are they so weak? It's probably beyond me as I am just one of these weaklings, dontcha know. As far as AGI goes, I don't think anyone has even proven that scaling LLMs can lead to "AGI."

Paging Dr. Susan Calvin!

It is right before our eyes:

AGI is not a fixed point but a barrier to be taken, a continuous spectrum.

We already have different GPT versions aka tiers. Gauss is ranging from whatever you want it: GPT 4.5 till now or later.

Claude Sonnet and Opus as well as Context Window max are tiers aka different levels of Almost AGI.

The main problem will be, when AGI looks back on us or meta reflection hits societies. Woke fought IQ based correlations in intellectual performance task. A fool with a tool is still a fool. How can you blame AGI for dumb mistakes? Not really.

Scapegoating an AGI is going to be brutal, because it laughs about these PsyOps and easily proves you wrong like a body cam.

AGI is an extreme leverage.

There is a reason why Math is categorically ruling out certain IQ ranges the higher you go in complexity factor.

We really are going to have a problem with cults popping up and worshipping these different systems. I guess this is the shape of things to come.