A human will tell you “I am not sure, and will have to ask engineering and get back to you in a few days”. None of these LLMs do that yet, they’re biased towards giving some answer, any answer.

I agree with you, but man i can't help but feel humans are the same depending on the company. My wife was recently fighting with several layers of comcast support over cap changes they've recently made. Seemingly it's a data issue since it's something new that theoretically hasn't propagated through their entire support chain yet, but she encountered a half dozen confidently incorrect people which lacked the information/training to know that they're wrong. It was a very frustrating couple hours.

Generally i don't trust most low paid (at no fault of their own) customer service centers anymore than i do random LLMs. Historically their advice for most things is either very biased, incredibly wrong, or often both.

In the case of unhelpful human support, I can leverage my experience in communicating with another human to tell if I'm being understood or not. An LLM is much more trial-and-error: I can't model the theory-of-mind behind it's answers to tell if I'm just communicating poorly or whatever else may be being lost in translation, there is no mind at play.

That's fair, though with an LLM (at least one you're familiar with) you can shape it's behavior. Which is not too different compared to some black box script that i can't control or reason through with a human support. Granted the LLM will have the same stupid black box script, so in both cases it's weaponized stupidity against the consumer.

This is not really true. If you give a decent model docs in the prompt and tell them to answer based on the docs and say “I don’t know” if the answer isn’t there, they do it (most of the time).

> most of the time

This is doing some heavy lifting

I have never seen this in the wild. Have you?

Yes. All the time. I wrote a tool that does it!

https://crespo.business/posts/llm-only-rag/

  $ rgd ~/repos/jj/docs "how can I write a revset to select the nearest bookmark?"

  Using full corpus (length: 400,724 < 500,000)

  # Answer

  gemini-2.5-flash  | $0.03243 | 2.94 s | Tokens: 107643 -> 56

  The provided documentation does not include a direct method to select the
  nearest bookmark using revset syntax. You may be able to achieve this using
  a combination of  ancestors() ,  descendants() , and  latest() , but the
  documentation does not explicitly detail such a method.

I need a big ol' citation for this claim, bud, because it's an extraordinary one. LLMs have no concept of truth or theory of mind so any time one tells you "I don't know" all it tells you is that the source document had similar questions with the answer "I don't know" already in the training data.

If the training data is full of certain statements you'll get certain sounding statements coming out of the model, too, even for things that are only similar, and for answers that are total bullshit

Do you use LLMs often?

I get "I don't know" answers from Claude and ChatGPT all the time, especially now that they have thrown "reasoning" into the mix.

Saying that LLMs can't say "I don't know" feels like a 2023-2024 era complaint to me.

Ok, how? The other day Opus spent 35 of my dollars by throwing itself again and again at a problem it couldn't solve. How can I get it to instead say "I can't solve this, sorry, I give up"?

That sounds slightly different from "here is a question, say I don't know if you don't know the answer" - sounds to me like that was Opus running in a loop, presumably via Claude Code?

I did have one problem (involving SQLite triggers) that I bounced off various LLMs for genuinely a full year before finally getting to an understanding that it wasn't solvable! https://github.com/simonw/sqlite-chronicle/issues/7

It wasn't in a loop really, it was more "I have this issue" "OK I know exactly why, wait" $3 later "it's still there" "OK I know exactly why, it's a different reason, wait", repeat until $35 is gone and I quit.

I would have much appreciated if it could throw its hands up and say it doesn't know.

I solve this by in my prompt. I say if you can’t fix it in two tries look online on how to do it if you still can’t fix it after two tries pause and ask for my help. It works pretty well.

Won't that be cool, when LLM-based AIs ask you for help instead of the other way around

You’re right that some humans will, and most LLMs won’t. But humans can be just as confidently wrong. And we incentivize them to make decisions quickly, in a way that costs the company less money.