Not the parent poster, but this is pretty much the foundation of LLMs. They are by their nature probabilistic, not deterministic. This is precisely what the parent is referring to.

All processes in reality, everywhere, are probablistic. The entire reason "engineering" is not the same as theoretical mathematics is about managing these probabilities to an acceptable level for the task you're trying to perform. You are getting a "probablistic" output from a human too. Human beings are not guaranteeing theoretically optimal excel output when they send their boss Final_Final_v2.xlsx. You are using your mental model of their capabilities to inform how much you trust the result.

Building a process to get a similar confidence in LLM output is part of the game.

I have to disagree. There are many areas where things are extremely deterministic, regulated financial services being one of those areas. As one example of zillions, look at something like Bond Math. All of it is very well defined, all the way down to what calendar model you will you use (360/30 or what have you), rounding, etc. It's all extremely well defined specifically so you can get apple to apple comparisons in the market place.

The same applies to my checkbook, and many other areas of either calculating actuals or where future state is well defined by a model.

That said, there can be a statistical aspect to any spreadsheet model. Obviously. But not all spreadsheets are statistical, and therein lies the rub. If an LLM wants to hallucinate a 9,000 day yearly calendar because it confuses our notion of a year with one of the outer planets, that falls well within probability, but not within determinism following well define rules.

The other side of the issue is LLMs trained on the Internet. What are the chances that Claude or whatever is going to make a change based on a widely prevalent but incorrect spreadsheet it found on some random corner of the Internet? Do I want Claude breaking my well-honed spreadsheet because Floyd in Nebraska counted sheep wrong in a spreadsheet he uploaded and forgot about 5 years ago, and Claude found it relevant?

Yup. It becomes clearer to me when I think about the existing validators. Can these be improved, for sure.

It’s when people make the leaps to the multi-year endgame and in their effort to monetise by building overconfidence in the product where I see the inherent conflict.

It’s going to be a slog… the detailed implementations. And if anyone is a bit more realistic about managing expectations I think Anthropic is doing it a little better.

> All processes in reality, everywhere, are probablistic.

If we want to go in philosophy then sure, you're correct, but this not what we're saying.

For example, an LLM is capable (and it's highly plausible for it to do so) of creating a reference to a non-existent source. Humans generally don't do that when their goal is clear and aligned (hence deterministic).

> Building a process to get a similar confidence in LLM output is part of the game.

Which is precisely my point. LLMs are supposed to be better than humans. We're (currently) shoehorning the technology.

> Humans generally don't do that when their goal is clear and aligned (hence deterministic).

Look at the language you're using here. Humans "generally" make less of these kinds of errors. "Generally". That is literally an assessment of likelihood. It is completely possible for me to hire someone so stupid that they create a reference to a non-existent source. It's completely possible for my high IQ genius employee who is correct 99.99% of the time to have an off-day and accidentally fat finger something. It happens. Perhaps it happens at 1/100th of the rate that an LLM would do it. But that is simply an input to the model of the process or system I'm trying to build that I need to account for.

When humans make mistakes repeatedly in their job they get fired.