LLM failures go viral because they trigger a "Schadenfreude" response to automation anxiety. If the oracle can't do basic logic, our jobs feel safe for another quarter.
Wrong.
LLM failures go viral because they trigger a "Schadenfreude" response to automation anxiety. If the oracle can't do basic logic, our jobs feel safe for another quarter.
Wrong.
I'd say it's moreso that it's a startlingly clear rebuttal to the tired refrain of, "Models today are nothing like they were X months ago!" When actually, yes, they still fucking blow.
So rather than patiently explain to yet another AI hypeman exactly how models are and aren't useful in any given workflow, and the types of subtle reasoning errors that lead to poor quality outputs misaligned with long-term value adds, only to invariably get blamed for user incompetence or told to wait Y more months, we can instead just point to this very concise example of AI incompetence to demonstrate our frustrations.
It's only a "startlingly clear rebuttal" if you can't remember what models months ago were like.
You are right about the motivation behind the glee but it actually has a kernel of truth in it: With making such elementary mistakes, this thing isn't going to be autonomous anytime soon.
Such elementary mistakes can be made by humans under influence of a substance or with some mental issues. It's pretty much the kind of people you wouldn't trust with a vehicle or anything important.
IMHO all entry level clerical jobs and coding as a profession is done but these elementary mistakes imply that people with jobs that require agency will be fine. Any non-entry level jobs have huge component of trust in it.
I think the 'elementary mistakes' in humans are far more common than confined to the mentally ill or intoxicated. There are entire shows/YT channels dedicated to grabbing a random person on the street and asking them a series of simple questions.
Often, these questions are pure-fact (who is the current US Vice President), but for some, the idea is that a young child can answer the questions better than an 'average' adult. These questions often play on the assumptions an adult might make that lead them astray, whereas a child/pre-teen answers the question correctly by having different assumptions or not assuming.
Presumably, even some of the worst (poorest performance) contestants in these shows (i.e. the ones selected for to provide humor for audiences) have jobs that require agency. I think it's more likely that most jobs/tasks either have extensive rules (and/or refer to rules defined elsewhere like in the legal system) or they have allowances for human error and ambiguity.
The LLM is probably also not going to launch into a rant about how they incorporate religious and racial beliefs into their life when asked about current heads of state. You ask the LLM about a solar configuration, and I think it must be exceptionally rare to have it instead tell you about its feelings on politics.
We had a big winter storm a few weeks ago, right when I received a large solar panel to review. I sent my grandpa a picture of the solar panel on its ground mount, covered in snow, noting I just got it today and it wasn't working well (he's very MAGA-y, so I figured the joke would land well). I received a straight-faced reply on how PV panels work, noting they require direct sunlight and that direct sunlight through heavy snow doesn't count; they don't tell you this when they sell these things, he says. I decided to chalk this up to being out-deadpanned and did not reply "thanks, ChatGPT."
I'm pretty sure %100 of those people would have the correct answers when they are focused and have access to the internet and studied the entire corpus of human knowledge.
In the case of the issue at hand though, it is not a knowledge question it is a logic question. No human will go to the carwash without the car unless they are intoxicated or are having something some issue preventing them from thinking clearly.
IMHO all that can be solved when AI actually start acting in place of human though. At this time "AI" is just an LLM that outputs something based on some single input but a human mind operates in a different environment than that.
I feel safe when claude outputs dd commands that wipe your drive to benchmark disk write speed :)
At least this Schadenfreude is better than the Schadenfreude AI boosters get when people are made redundant to AI. I can totally see some people getting warm fuzzies, scolling Tiktok, watching people crying having lost not only their job, but their entire career.
Im not even exaggerating, you can see these types of comments on social media
The funny thing is this thread has become a commercial for thinking mode and probably would result in more token consumption, and therefore more revenue for AI companies.
I agree that this is more of a social media effect than an LLM effect. But I'll add that this failure mode is very repeatable, which is a condition for its virality. A lot of people can reproduce the failure, even if it isn't 100% reproducible, even better for virality, if 50% can reproduce it and 50% can't, it feeds off even more into the polarizing "white dress blue dress" effect.
[dead]
[dead]