What exactly did 2025 AI hallucinate for you? The last time I've seen a hallucination from these things was a year ago. For questions that a kid or a student is going to answer im not sure any reasonable person should be worried about this.

If the last time you saw a wrong answer was a year ago, then you are definitely regularly getting them and not noticing.

Just a couple of days ago, I submitted a few pages from the PDF of a PhD thesis written in French to ChatGPT, asking it to translate them into English. The first 2-3 pages were perfect, then the LLM started hallucinating, putting new sentences and removing parts. The interesting fact is that the added sentences were correct and generally on the spot: the result text sounded plausible, and only a careful comparison of each sentence revealed the truth. Near the end of the chapter, virtually nothing of what ChatGPT produced was directly related to the original text.

Transformer models are excellent at translation, but next-token prediction is not the correct architecture for it. You want something more like seq2seq. Next token prediction cares more about local consistency (i.e., going off on a tangent with a self-consistent but totally fabricated "translation") than faithfulness.

I use it every day for work and every day it gets stuff wrong of the "that doesn't even exist" variety. Because I'm working on things that are complex + highly verifiable, I notice.

Sure, Joe Average who's using it to look smart in Reddit or HN arguments or to find out how to install a mod for their favorite game isn't gonna notice anymore, because it's much more plausible much more often than two years ago, but if you're asking it things that aren't trivially easy for you to verify, you have no way of telling how frequently it hallucinates.

I had Google Gemini 2.5 Flash analyse a log file and it quoted content that simply didn't exist.

It appears to me like a form of decoherence and very hard to predict when things break down.

People tend to know when they are guessing. LLMs don't.

Nah it's not that rare.

This is one I got today:

https://chatgpt.com/share/6889605f-58f8-8011-910b-300209a521...

(image I uploaded: http://img.nrk.no/img/534001.jpeg)

The correct answer would have been Skarpenords Bastion/kruttårn.

OpenAI's o3/40 models completely spun out when I was trying to write a tiny little TUI with ratatui, couldn't handle writing a render function. No idea why, spent like 15 minutes trying to get it to work, ended up pulling up the docs..

I haven't spent any money with claude on this project and realistically it's not worth it, but I've run into little things like that a fair amount.

>Thanks all for the replies, we’re hardcoding fixes now

-LLM devcos

Jokes aside, get deep into the domains you know. Or ask to give movie titles based on specific parts of uncommon films. And definitely ask for instructions using specific software tools (“no actually Opus/o3/2.5, that menu isn’t available in this context” etc.).

For starters, lots of examples over the last few months where AIs make up stuff when it comes to coding.

A couple of non-programming examples: https://www.evidentlyai.com/blog/llm-hallucination-examples

Are you using them daily? I find that maybe 3 or 4 programming questions I ask per day, it simply cannot provide a correct answer even after hand holding. They often go to extreme gymnastics to try to gaslight you no matter how much proof you provide.

For example, today I was asking a LLM about how to configure a GH action to install a SDK version that was just recently out of support. It kept hallucinating on my config saying that when you provide multiple SDK versions in the config, it only picks the most recent. This is false. It's also mentioned in the documentation specifically, which I linked the LLM, that it installs all versions you list. Explaining this to copilot, it keeps doubling down, ignoring the docs, and even going as far as asking me to have the action output the installed SDKs, seeing all the ones I requested as installed, then gaslighting me saying that it can print out the wrong SDKs with a `--list-sdks` command.

ChatGPT hallucinates things all the time. I will feed it info on something and have a conversation. At first it's mostly fine, but eventually it starts just making stuff up.

I've found that giving it occasional nudges (like reminding it of the original premise) can help keep it on track

Ah yes it is a fantastic tool when you manually correct it all the time.

For me, most commonly ChatGPT hallucinates configuration options and command line arguments for common tools and frameworks.

Two days ago when my boomer mother in law tried to justify her anti-cancer diet that killed Steve Jobs. On the bright side my partner will be inheriting soon by the looks of it.

Not defending your mother-in-law here (because I agree with you that it is a pretty silly and maybe even potentially harmful diet), afaik it wasn’t the diet itself that killed Steve Jobs. It was his decision to do that diet instead of doing actual cancer treatment until it was too late.

Given that I've got two people telling me here "ackshually" I guess it may not be hallucinations and just really terrible training data.

Up next - ChatGPT does jumping off high buildings kill you?

>>No jumping off high buildings is perfectly safe as long as you land skillfully.

Job's diet didn't kill him. Not getting his cancer treated was what killed him.

Yes, we also covered that jumping off buildings doesn't kill people. The landing does.

Indeed if you're a base jumper with a parachute, you might survive the landing.

Ackshually, this seems analogous to Job's diet and refusal of cancer treatment! And it was the cancer that put him at the top of the building in the first place.

The anti cancer diet absolutely works if you want to reduce the odds of getting cancer. It probably even works to slow cancer compared to the average American diet. Will it stop and reverse a cancer? Probably not.

I thought it was high fiber diets that reduce risk of cancer (ever so slightly), because of reduced inflammation. Not fruity diets, which are high in carbohydrates.

Cutting red or preserved meat cuts bowel cancer risk so fruity diets would cut that risk.

How much does it 'reduce the odds'?

Idk, I'm not an encyclopedia. You can Google it.

Last week I was playing with the jj VCS and it couldn't even understand my question (how to swap two commits).

How do you know? its literally non-deterministic.

Most (all?) AI models I work with are literally deterministic. If you give it the same exact input, you get the same exact output every single time.

What most people call “non-deterministic” in AI is that one of those inputs is a _seed_ that is sourced from a PRNG because getting a different answer every time is considered a feature for most use cases.

Edit: I’m trying to imagine how you could get a non-deterministic AI and I’m struggling because the entire thing is built on a series of deterministic steps. The only way you can make it look non-deterministic is to hide part of the input from the user.

This is an incredibly pedantic argument. The common interfaces for LLMs set their temperature value to non-zero, so they are effectively non-deterministic.

From the good old days: https://152334h.github.io/blog/non-determinism-in-gpt-4/ (that's been a short two years).

Unless something has fundamentally changed since then (which I've not heard about) all sparse models are only deterministic at the batch level, rather than the sample level.

Even after temperature=0 I believe there is some non-determinism at the chip level, similar to https://stackoverflow.com/questions/50744565/how-to-handle-n...

> I’m trying to imagine how you could get a non-deterministic AI

Depends on the machine that implements the algorithm. For example, it’s possible to make ALUs such that 1+1=2 most of the time, but not all the time.

Just ask Intel. (Sorry, I couldn’t resist)

So by default. Its non-deterministic for all non power users.