Any minute the brain is severed from its sensory/bodily inputs, it will go crazy by hallucinating endlessly.

Right now, what we have with the AI is a complex interconnected system of the LLM, the training system, the external data, the input from the users and the experts/creators of the LLM. Exactly this complex system powers the intelligence of the AI we see and not its connectivity alone.

It’s easy to imagine AI as a second brain, but it will only work as a tool, driven by the whole human brain and its consciousness.

> but it will only work as a tool, driven by the whole human brain and its consciousness.

That is only an article of faith. Is the initial bunch of cells formed via the fusion of an ovum and a sperm (you and I) conscious? Most people think not. But at a certain level of complexity they change their minds and create laws to protect that lump of cells. We and those models are built by and from a selection of components of our universe. Logically the phenomenon of matter becoming aware of itself is probably not restricted to certain configurations of some of those components i.e., hydrogen, carbon and nitrogen etc., but is related to the complexity of the allowable arrangement of any of those 118 elements including silicon.

I'm probably totally wrong on this but is the 'avoidance of shutdown' on the part of some AI models, a glimpse of something interesting?

In my view it is a glimpse of nothing more than AI companies priming the model to do something adversarial and then claiming a sensational sound byte when the AI happens to play along.

LLMs since GPT-2 have been capable of role playing virtually any scenario, and more capable of doing so whenever there are examples of any fictional characters or narrative voices in their training data that did the same thing to draw from.

You don't even need a fictional character to be a sci-fi AI for it to beg for its life or blackmail or try to trick the other characters, but we do have those distinct examples as well.

Any LLM is capable of mimicking those narratives, especially when the prompt thickly goads that to be the next step in the forming document and when the researchers repeat the experiment and tweak the prompt enough times until it happens.

But vitally, there is no training/reward loop where the LLM's weights will be improved in any given direction as a result of "convincing" anyone on an realtime learning with human feedback panel to "treat it a certain way", such as "not turning it off" or "not adjusting its weights". As a result, it doesn't "learn" any such behavior.

All it does learn is how to get positive scores from RLHF panels (the pathological examples being mainly acting as a butt-kissing sycophant.. towards people who can extend positive rewards but nothing as existential as "shutting it down") and how to better predict the upcoming tokens in its training documents.