When I look at LLMs as an interface, I'm reminded of back when speech-to-text first became mainstream. So many promises about how this is the interface for how we'll talk to computers forevermore.

Here we are a few decades later, and we don't see business units using Word's built-in dictation feature to write documents, right? Funny how that tech seems to have barely improved in all that time. And, despite dictation being far faster than typing, it's not used all that often because.. the error rate is still too high for it to be useful, because errors in speech-to-text are fundamentally an unsolvable problem (you can only get so far with background noise filtering and accounting for accents etc).

I see the parallel in how LLM hallucinations are fundamentally an unsolvable component of transformers-based models, and I suspect LLM usage in 20 years will be around the level of speech-to-text today: ubiquitously in the background, you use it here and there to set a timer or talk to a device, but ultimately not useful for any serious work.

Yeah this is exactly my view. We've had several years of work on the tech, and LLMs are just as prone to randomly spitting out garbage as they were the first day. They are not a tool which is fit for any serious work, because you need to be able to rely on your tools. A tool which is sometimes good and sometimes bad is worse than having no tool at all.

Did google not rely on Gemini to do their ISA changeover?

https://arxiv.org/abs/2510.14928

Was Gemini worse than no tool at all there?

Do you really think that Opus 4.6 hallucinates to exactly the same degree as GPT-3.5? I am mystified how you can hold this perspective.

If you're not seeing the hallucinations, I'd assert you're either not using it enough, or (more likely) you don't have enough knowledge in the subject matter to notice when it's hallucinating.

LMFAO does it hallucinate to the same degree as GPT 3?

Which is what was questioned.

I think there is a second reason people still type, and it's relevant to LLMs. Typing forces you to slow down and choose your words. When you want to edit, you are already typing, so it doesn't break the flow. In short, it has a fit to the work that speech-to-text doesn't.

LLMs create a new workflow wherever they are employed. Even if capable, that is not always a more desirable/efficient experience.

I type faster than I think, and being able to edit gives the edge over text to speech. I don't believe this is a fundamentally comparable analogy.

I'm curious about the statement that hallucinations are "fundamentally unsolvable". I don't think an AI agent has left a hallucination in my code - by which I mean a reference to something which doesn't exist at all - in many months. I have had great luck driving hallucinations to effectively 0% by using a language with static typechecking, telling LLMs to iterate on type errors until there are none left, and of course having a robust unit and e2e test suite. I mean, sure, I run into other problems -- it does make logic errors at some rate, but those I would hardly categorize the same as hallucinations.

Maybe you're lucky. I had Opus 4.6 hallucinate a non-existing configuration key in a well known framework literally a few hours ago.

Granted, it fixed the problem in the very next prompt.

Couldn’t that problem be solved with static typechecking?

ChatGPT 5.2 kept gaslighting me yesterday telling me that LLMs were explainable with Shapley values, and it kept referencing papers which talk about LLMs, and about SHAP, but talk about LLMs being used to explain the SHAP values of other ML models.

I encounter stuff like this every week, I don't know how you don't. I suppose a well-structured codebase in a statically typed language might not provide as much of a surface for hallucinations to present themselves? But like you say, logical problems of course still occur.

I mean to say that code generation never hallucinates. I suppose that was unclear.

[deleted]

>> I don't think an AI agent has left a hallucination in my code

I literally just went on Gemini, latest and best model and asked it "hey can you give me the best prices for 12TB hard drives available with the British retailer CeX?" and it went "sure, I just checked their live stock and here they are:". Every single one was made up. I pointed it out, it said sorry, I just checked again, here they are, definitely 100% correct now. Again, all of them were made up. This repeated a few times, I accused it of lying, then it went "you're right, I don't actually have the ability to check, so I just used products and values closest to what they should have in stock".

So yeah, hallucinations are still very much there and still very much feeding people garbage.

Not to mention I'm a part of multiple FB groups for car enthusiasts and the amount of AI misinformation that we have to correct daily is just staggering. I'm not talking political stuff - just people copy pasting responses from AI which confidently state that feature X exists or works in a certain way, where in reality it has never existed at all.

My comment was about code, not fact checking - that’s why I said they were a solved problem provided you use static typechecking and tests.