> Us having to specify things that we would never specify when talking to a human.
The first time I read that question I got confused: what kind of question is that? Why is it being asked? It should be obvious that you need your car to wash it. The fact that it is being asked in my mind implies that there is an additional factor/complication to make asking it worthwhile, but I have no idea what. Is the car already at the car wash and the person wants to get there? Or do they want to idk get some cleaning supplies from there and wash it at home? It didn't really parse in my brain.
I would say, the proper response to this question is not "walk, blablablah" but rather "What do you mean? You need to drive your car to have it washed. Did I miss anything?"
Yes, this is what irks me about all the chatbots, and the chat interface as a whole. It is a chat-like UX without a chat-like experience. Like you are talking to a loquacious autist about their favorite topic every time.
Just ask me a clarifying question before going into your huge pitch. Chats are a back & forth. You don’t need to give me a response 10x longer than my initial question. Etc
I think for "GPT-4o is my life partner" reasons, labs are a little bit icey about making the models overly human.
Doubt. The labs are afraid of users becoming too hooked on their products? lol…
People offing themselves because their lover convinced them it's time is absolutely not worth the extra addiction potential. We even witnessed this happen with OAI.
It's a fast track to public disdain and heavy handed government regulation.
Regulation would be preferable for OpenAI to the tort lawyers. In general the LLM companies should want regulation because the alternative is tort, product liability tort, and contract law.
There is no way without the protections that could be afforded by regulation to offer such wide-ranging uses of the product without also accepting significant liability. If the range of "foreseeable misuse" is very broad and deep, so is the possible liability. If your marketing says that the bot is your lawyer, doctor, therapist, and spouse in one package, how is one to say that the company can escape all the comprehensive duties that attach to those social roles. Courts will weigh the tiny and inconspicuous disclaimers against the very large and loud marketing claims.
The companies could protect themselves in ways not unlike the ways in which the banking industry protects itself by replacing generic duties with ones defined by statute and regulation. Unless that happens, lawyers will loot the shareholders.
It’s funny seeing you frame regulation as needed to protect trillion dollar monopolies from consumers and not the other way around.
Or sama is just waiting to premium subscription gate companions in some adult content package as he has hinted something along these lines may be forthcoming. Maybe tie it in with the hardware device Ive is working on. Some sort of hellscape tamogotchi.
Recall: "As part of our 'treat adult users like adults' principle, we will allow even more, like erotica for verified adults," Altman wrote in the Oct.
I'm struggling a bit when it comes to wording this with social decorum, but how long do we reckon it takes until there's AI powered adult toys? There's a market opportunity that i do not want to see being fulfilled, ever..
I did work on a supervised fine-tuning project for one of the major providers a while back, and the documentation for the project was exceedingly clear about the extent to which they would not tolerate the model responding as if it was a person.
Some of the labs might be less worried about this, but they're not by any means homogenous.
> Like you are talking to a loquacious autist about their favorite topic every time
That's the best part.
People need to touch grass
People need to smoke grass and chill out.
With ChatGPT, at least, you can tell the bot to work that way using [persistent] Custom Instructions, if that's what you want. These aren't obeyed perfectly (none of the instructions are, AFAICT), but they do influence behavior.
A person can even hammer out an unstructured list of behavioral gripes, tell the bot to organize them into instructional prose, have it ask clarifying questions and revise based on answers, and produce directions for integrating them as Custom Instructions.
From then on, it will invisibly read these instructions into context at the beginning of each new chat.
Mold it and steer it to be how you want it to be.
(My own bot tends to be very dry, terse, non-presumptuous, pragmatic, and profane. It's been years now since it has uttered an affirmation like "That's a great idea!" or "Wow! My circuits are positively buzzing with the genius I'm seeing here!" or produced a tangential dissertation in response to a simple question. But sometimes it does come back with functional questions, or phrasing like "That shit will never work. Here's why.")
This. Nailed it.
>You don’t need to give me a response 10x longer than my initial question.
Except, of course, when that is exactly what the user wants.
To me that’s not a chat interface, that’s a search interface.
Chat is a back & forth.
Search is a one-shot.
That’s why I don’t understand why LLMs don’t ask clarifying questions more often.
In a real human to human conversation, you wouldn’t simply blurt out the first thing that comes to mind. Instead, you’d ask questions.
Google Gemini often gives an overly lengthy response, and then at the end asks a question. But the question seems designed to move on to some unnecessary next step, possibly to keep me engaged and continue conversing, rather than seeking any clarification on the original question.
This is a great point, because when you ask it (Claude) if it has any questions, it often turns out it has lots of good ones! But it doesn't ask them unless you ask.
That's because it doesn't really have any questions until you ask it whether it does.
This is the most important comment in this entire thread IMO, and it’s a bit buried.
This is the fundamental limitation with generative AI. It only generates, it does not ponder.
You can define "ponder" in multiple ways, but really this is why thinking models exist - they turn over the prompt multiple times and iterate on responses to get to a better end result.
Well I chose the word “ponder” carefully, given the fact that I have a specific goal of contributing to this debate productively. A goal that I decided upon after careful reflection over a few years of reading articles and internet commentary, and how it may affect my career, and the patterns I’ve seen emerge in this industry. And I did that all patiently. You could say my context window was infinite, only defined by when I stop breathing.
That is to say, all of that activity I listed is activity I’m confident generative AI is not capable of, fundamentally.
Like I said in a cousin comment, we can build Frankenstein algorithms and heuristics on top of generative AI but every indication I’ve seen is that that’s not sufficient for intelligence in terms of emergent complexity.
Imagine if we had put the same efforts towards neural networks, or even the abacus. “If I create this feedback loop, and interpret the results in this way, …”
Agreed that feedback loops on top of generative LLMs will not get us to AGI or true intelligence.
what is the difference between "ponder" and "generate"? the number of iterations?
Probably the lack of external stimuli. Generative AI only continues generating when prompted. You can play games with agents and feedback loops but the fundamental unit of generative AI is prompt-based. That doesn’t seem, to me, to be a sufficient model for intelligence that would be capable of “pondering”.
My take is that an artificial model of true intelligence will only be achieved through emergent complexity, not through Frankenstein algorithms and heuristics built on generative AI.
Generative AI does itself have emergent complexity, but I’m bearish that if we would even hook it up to a full human sensory input network it would be anything more than a 21st century reverse mechanical Turk.
Edit: tl;dr Emergent complexity is a necessary but insufficient criteria for intelligence
you can get it to change by putting instructions to ask questions in the system prompt but I found it annoying at a while
Because 99% of the time it's not what users want.
You can get it to ask you clarifying questions just by telling it to. And then you usually just get a bunch of questions asking you to clarify things that are entirely obvious, and it quickly turns into a waste of time.
The only time I find that approach helpful is when I'm asking it to produce a function from a complicated English description I give it where I have a hunch that there are some edge cases that I haven't specified that will turn out to be important. And it might give me a list of five or eight questions back that force me to think more deeply, and wind up being important decisions that ensure the code is more correct for my purposes.
But honestly that's pretty rare. So I tell it to do that in those cases, but I wouldn't want it as a default. Especially because, even in the complex cases like I describe, sometimes you just want to see what it outputs before trying to refine it around edge cases and hidden assumptions.
This is a topic that I’ve always found rather curious, especially among this kind of tech/coding community that really should be more attuned to the necessity of specificity and accuracy. There seems to be a base set of assumptions that are intrinsic to and a component of ethnicities and cultures, the things one can assume one “wouldn’t never specify when talking to a human [of one’s own ethnicity and culture].”
It’s similar to the challenge that foreigners have with cultural references and idioms and figurative speech a culture has a mental model of.
In this case, I think what is missing are a set of assumptions based on logic, e.g., when stating that someone wants to do something, it assumes that all required necessary components will be available, accompany the subject, etc.
I see this example as really not all that different than a meme that was common among I think the 80s and 90s, that people would forget buying batteries for Christmas toys even though it was clear they would be needed for an electronic toy. People failed that basic test too, and those were humans.
It is odd how people are reacting to AI not being able to do these kinds of trick questions, while if you posted something similar about how you tricked some foreigners you’d be called racist, or people would laugh if it was some kind of new-guy hazing.
AI is from a different culture and has just arrived here. Maybe we’re should be more generous and humane… most people are not humane though, especially the ones who insist they are.
Frankly, I’m not sure it bodes well for if aliens ever arrive on Earth, how people would respond; and AI is arguably only marginally different than humans, something an alien life that could make it to Earth surely would not be.
Whether you view the question as nonsensical, the most simple example of a riddle, or even an intentional "gotcha" doesn't really matter. The point is that people are asking the LLMs very complex questions where the details are buried even more than this simple example. The answers they get could be completely incorrect, flawed approaches/solutions/designs, or just mildly misguided advice. People are then taking this output and citing it as proof or even objectively correct. I think there are ton of reasons this could be but a particularly destructive reason is that responses are designed to be convincing.
You _could_ say humans output similar answers to questions, but I think that is being intellectually dishonest. Context, experience, observation, objectivity, and actual intelligence is clearly important and not something the LLM has.
It is increasingly frustrating to me why we cannot just use these tools for what they are good for. We have, yet again, allowed big tech to go balls deep into ham-fisting this technology irresponsibly into every facet of our lives the name of capital. Let us not even go into the finances of this shitshow.
Yeah people are always like "these are just trick questions!" as though the correct mode of use for an LLM is quizzing it on things where the answer is already available. Where LLMs have the greatest potential to steer you wrong is when you ask something where the answer is not obvious, the question might be ill-formed, or the user is incorrectly convinced that something should be possible (or easy) when it isn't. Such cases have a lot more in common with these "nonsensical riddles" than they do with any possible frontier benchmark.
This is especially obvious when viewing the reasoning trace for models like Claude, which often spends a lot of time speculating about the user's "hints" and trying to parse out the intent of the user in asking the question. Essentially, the model I use for LLMs these days is to treat them as very good "test takers" which have limited open book access to a large swathe of the internet. They are trying to ace the test by any means necessary and love to take shortcuts to get there that don't require actual "reasoning" (which burns tokens and increases the context window, decreasing accuracy overall). For example, when asked to read a full paper, focusing on the implications for some particular problem, Claude agents will try to cheat by skimming until they get to a section that feels relevant, then searching directly for some words they read in that section. They will do this even if told explicitly that they must read the whole paper. I assume this is because the vast majority of the time, for the kinds of questions that they are trained on, this sort of behavior maximizes their reward function (though I'm sure I'm getting lots of details wrong about the way frontier models are trained, I find it very unlikely that the kinds of prompts that these agents get very closely resemble data found in the wild on the internet pre-LLMs).