Is this just a feeling you have or is this downstream of actual use cases you've applied AI to observed and measured reliability on?
Is this just a feeling you have or is this downstream of actual use cases you've applied AI to observed and measured reliability on?
Not the parent poster, but this is pretty much the foundation of LLMs. They are by their nature probabilistic, not deterministic. This is precisely what the parent is referring to.
All processes in reality, everywhere, are probablistic. The entire reason "engineering" is not the same as theoretical mathematics is about managing these probabilities to an acceptable level for the task you're trying to perform. You are getting a "probablistic" output from a human too. Human beings are not guaranteeing theoretically optimal excel output when they send their boss Final_Final_v2.xlsx. You are using your mental model of their capabilities to inform how much you trust the result.
Building a process to get a similar confidence in LLM output is part of the game.
I have to disagree. There are many areas where things are extremely deterministic, regulated financial services being one of those areas. As one example of zillions, look at something like Bond Math. All of it is very well defined, all the way down to what calendar model you will you use (360/30 or what have you), rounding, etc. It's all extremely well defined specifically so you can get apple to apple comparisons in the market place.
The same applies to my checkbook, and many other areas of either calculating actuals or where future state is well defined by a model.
That said, there can be a statistical aspect to any spreadsheet model. Obviously. But not all spreadsheets are statistical, and therein lies the rub. If an LLM wants to hallucinate a 9,000 day yearly calendar because it confuses our notion of a year with one of the outer planets, that falls well within probability, but not within determinism following well define rules.
The other side of the issue is LLMs trained on the Internet. What are the chances that Claude or whatever is going to make a change based on a widely prevalent but incorrect spreadsheet it found on some random corner of the Internet? Do I want Claude breaking my well-honed spreadsheet because Floyd in Nebraska counted sheep wrong in a spreadsheet he uploaded and forgot about 5 years ago, and Claude found it relevant?
Yup. It becomes clearer to me when I think about the existing validators. Can these be improved, for sure.
It’s when people make the leaps to the multi-year endgame and in their effort to monetise by building overconfidence in the product where I see the inherent conflict.
It’s going to be a slog… the detailed implementations. And if anyone is a bit more realistic about managing expectations I think Anthropic is doing it a little better.
> All processes in reality, everywhere, are probablistic.
If we want to go in philosophy then sure, you're correct, but this not what we're saying.
For example, an LLM is capable (and it's highly plausible for it to do so) of creating a reference to a non-existent source. Humans generally don't do that when their goal is clear and aligned (hence deterministic).
> Building a process to get a similar confidence in LLM output is part of the game.
Which is precisely my point. LLMs are supposed to be better than humans. We're (currently) shoehorning the technology.
> Humans generally don't do that when their goal is clear and aligned (hence deterministic).
Look at the language you're using here. Humans "generally" make less of these kinds of errors. "Generally". That is literally an assessment of likelihood. It is completely possible for me to hire someone so stupid that they create a reference to a non-existent source. It's completely possible for my high IQ genius employee who is correct 99.99% of the time to have an off-day and accidentally fat finger something. It happens. Perhaps it happens at 1/100th of the rate that an LLM would do it. But that is simply an input to the model of the process or system I'm trying to build that I need to account for.
When humans make mistakes repeatedly in their job they get fired.
Not OP but using LLMs in any professional setting, like programming, editing or writing technical specifications, OP is correct.
Without extensive promoting and injectimg my own knowledge and experience, LLMs generate absolute unusable garbage (on average). Anyone who disagrees very likely is not someone who would produce good quality work by themselves (on average). That's not a clever quip; that's a very sad reality. SO MANY people cannot be bothered to learn anything if they can help it.
The triad of LLM dependencies in my view: initiation of tasks, experience based feedback, and consequence sink. They can do none of these, they all connect to the outer context which sits with the user, not the model.
You know what? This is also not unlike hiring a human, they need the hirer party tell them what to do, give feedback, and assume the outcomes.
It's all about context which is non-fungible and distributed, not related to intelligence but to the reason we need intelligence for.
> Anyone who disagrees very likely is not someone who would produce good quality work by themselves (on average).
So for those producing slop and not knowing any better (or not caring), AI just improved the speed at which they work! Sounds like a great investment for them!
For many mastering any given craft might not be the goal, but rather just pushing stuff out the door and paying bills. A case of mismatched incentives, one might say.
I would completely disagree. I use LLMs daily for coding. They are quite far from AGI and it does not appear they are replacing Senior or Staff Engineers any time soon. But they are incredible machines that are perfectly capable of performing some economically valuable tasks in a fraction of the time it would have taken a human. If you deny this your head is in the sand.
Capable, yeah, but not reliable, that's my point. They can one shot fantastic code, or they can one shot the code I then have to review and pull my hair out over for a week, because it's such crap (and the person who pushed it is my boss, for example, so I can't just tell him to try again).
That's not consistent.
You can ask your boss to submit PRs using Codex’s “try 5 variations of the same task and select the one you like most though
Surely at that point they could write the code themselves faster than they can review 5 PRs.
Producing more slop for someone else to work through is not the solution you think it is.
Why do you frame the options as "one shot... or... one shot"?
Because lazy people will use it like that, and we are all inherently lazy
It's not much better with planning either. The amount of time I spent planning, clarifying requirements, hand-holding implementation details always offset any potential savings.
Have you never used one to hunt down an obscure bug and found the answer quicker than you likely would have yourself?
Actually, yeah, a couple of times, but that was a rubber-ducky approach; the AI said something utterly stupid, but while trying to explain things, I figured it out. I don't think an LLM has solved any difficult problem for me before. However, I think I'm likely an outlier because I do solve most issues myself anyways.