I feel pain for the people who will be employed to "prompt engineer" the behavior of these things. When they inevitably hallucinate some insane behavior a human will have to take blame for why it's not working.. and yea, that'll be fun to be on the receiving end of.
Humans 'hallucinate' like LLMs. The term used however, is confabulation: we all do it, we all do it quite frequently, and the process is well studied(1).
> We are shockingly ignorant of the causes of our own behavior. The explanations that we provide are sometimes wholly fabricated, and certainly never complete. Yet, that is not how it feels. Instead it feels like we know exactly what we're doing and why. This is confabulation: Guessing at plausible explanations for our behavior, and then regarding those guesses as introspective certainties. Every year psychologists use dramatic examples to entertain their undergraduate audiences. Confabulation is funny, but there is a serious side, too. Understanding it can help us act better and think better in everyday life.
I suspect it's an inherent aspect of human and LLM intelligences, and cannot be avoided. And yet, humans do ok, which is why I don't think it's the moat between LLM agents and AGI that it's generally assumed to be. I strongly suspect it's going to be yesterday's problem in 6-12 months at most.
(1) https://www.edge.org/response-detail/11513
No, confabulation isn’t anything like how LLMs hallucinate. LLMs will just very confidently make up APIs on systems they otherwise clearly have been trained on.
This happens nearly every time I request “how tos” for libraries that aren’t very popular. It will make up some parameters that don’t exist despite the rest of the code being valid. It’s not a memory error like confabulation where it’s convinced the response is valid from memory either, because it can be easily convinced that it made a mistake.
I’ve never worked with an engineer in my 25 years in the industry that has done this. People don’t confabulate to get day to day answers. What we call hallucination is the exact same process LLMs use to get valid answers.
You work with engineers who confabulate all the time: it's an intrinsic aspect of how the human brain functions that has been demonstrated at multiple levels of cognition.
> Humans 'hallucinate' like LLMs. The term used however, is confabulation: we all do it, we all do it quite frequently, and the process is well studied(1).
Yea i agree, i'm not making a snipe at LLMs or anything of the sort.
I'm saying i expect there to be a human-fallback in the system for quite some time. But solving the fallback problems with be one of black boxes. Which is the worst kind of project in my view, i hate working on code i don't understand. Where the results are not predictable.
That won't even be a real job. How exactly will there be this complex intelligence that can solve all these real world problems, but can't handle some ambiguity in some inputs it is provided? Wouldn't the ultra smart AI just ask clarifying questions so that literally anyone can "prompt engineer"?
As long as there is liability, there must be a human to blame, no matter how irrational. Every system has a failure mode, and ML models, especially the larger ones, often have the most odd and unique ones.
For example, we can mostly agree CLIP does a fine job classifying images, except if you glue a sticky note saying "iPod" onto an apple, it would say classify it as such.
No matter the performance, these are categorically statistical machines reaching for the most immediately useful representations, yielding an incoherent world model. These systems will be proposed as replacement to humans, they will do their best to pretend to work, they will inevitably fail over a long enough time horizon, and a human accustomed to rubber-stamping its decisions, or perhaps fooled by the shape of a correct answer, or simply tired enough to let it slip by, will take the blame.