I have been developing software since the late 80s, mostly CAM software for metal cutting machines, and I have been refereeing tabletop roleplaying games like Dungeons & Dragons since the late 70s.

I get the power of LLMs, and I do find them useful. But I find them useful in much the same way I find a really good set of random tables useful, or a good set of rules for procedurally generating something like a star sector for a science fiction campaign.

For my day job developing software, and for the RPG campaigns and books I run and publish today, LLMs are, in many cases, random tables on steroids. After using them for two years, even with all their improvements, I am continually reminded by the results I get that, at the heart of it, I am still dealing with what amounts to randomly generated content.

Yes, I know it is more accurate to call the process probabilistic rather than random. And yes, somebody can construct a technically deterministic setup with fixed weights, fixed seeds, fixed sampling parameters, and a frozen runtime environment. But that is like saying you can recreate a rainstorm if you get a thousand butterflies to flap their wings in exactly the right way. It may be technically true, but it is not how the technology behaves in normal day-to-day use.

For practical purposes, given the same prompt and the same apparent starting conditions, the result can differ each time you use a model. The outputs will often be highly correlated, and often useful, but they are not deterministic software in the ordinary sense.

So far, I am failing to see how the inherent probabilistic nature of the technology can be fully overcome. I understand how we got to where we are today from older neural net technology, including the systems used for vision and sound. What we have now can be very useful. But my view is that it is being badly oversold and overhyped. Its probabilistic nature is being vastly underestimated, and that is a major reason for much of the weirdness and many of the failures we keep seeing.

In tabletop roleplaying, there have been times when hobbyists relied too much on procedurally generated content and ultimately got burned by it, either through campaigns that were not as fun or products that were subpar. Each time, the lesson was the same: there is no substitute for human judgment.

Any workflow or technology incorporating LLMs has to keep humans in the loop, and not merely as rubber stamps. The human has to remain the primary decision maker.

> So far, I am failing to see how the inherent probabilistic nature of the technology can be fully overcome

I deeply hope we never reach the point where that’s overcome. What we’ve seen over the past few years is how AI will destroy humanness from pretty much the entire digital realm. It’s by far the most evil, anti-human technology ever created, corrupting everything it touches. The last thing we need is for it to become reliable

The trouble is: there is no deterministic algorithm that can do the things neural networks can do.

For many of these problems, I think it is likely that no deterministic algorithm can exist because the problems are fundamentally underspecified. E.g. a common task in computer vision is generating a 3D depth map from a 2D image. This is inverting a lossy projection, so any solution must be a least partially a hallucination.

I think we just have to accept this. It's a different type of algorithm, built out of statistics instead of logic, with different strengths and weaknesses compared to traditional software.

> I am failing to see how the inherent probabilistic nature of the technology can be fully overcome.

This is common in image generation pipelines because if you find an image you really like, you can store the seed and then reproduce it with small tweaks, otherwise - to quote Borges - “Look at it well. You will never see it again" User-facing deterministic pipelines do exist for generative AI.

I know you make this argument in your post, but that's really the answer if you want repeatable results. For a classifier or a detector, determinism is a requirement, but for an LLM non-determinism desirable property because it feels like a more natural conversation. The downside is it's extremely difficult to replicate a response without pointing the model to an earlier conversation.

And specifically for the RPG case, don't you want non-determinism? You don't want the model spinning up the same identical person if you say "Generate me an NPC character sheet for an innkeeper". This was a complaint that people had in the past, that models would regurgitate the same scenarios or the same jokes.

Where I suspect DMs run into trouble is not randomness, but lack of self-consistency in worldbuilding. Say you generate an NPC and then refer back to them later and the model gets some details wrong. You could compare to a system like Dwarf Fortress where everything down to the genealogy and faction relationships are rigidly generated.

Setting aside that we're living in a universe that's full of (practically) deterministic processes built over probabilistic components (and which behave sufficiently reliably without any human in the loop), I think the specific failure mode you're citing is that there aren't enough gates and constraints applied to the processes you've seen.

LLMs can contribute quite reliably given very narrow prompts and short horizons (keeping turns low and context brief). If you chain a bunch of these narrow contributions together and define guardrails (structured outputs, online evals, other-llm-as-judge/jury, etc...) you can produce a very repeatable workflow that reliably delivers to defined service levels.

The obvious issue being - you've got to define the workflow and implement all the guardrails, not hope that the LLM will infer them during a session or a one-shot prompt.

I think we need to disqualify humans as well. Their brains have been shown to operate on probabilistic chemical interactions and even quantum effects.

That doesn’t disqualify humans. It highlights the difference I am talking about.

Those chemical interactions and quantum effects lead to emergent properties like judgment, experience, context, accountability, and an understanding of consequences. Those are not properties that LLMs possess, regardless of how useful their output can be.

That is not to say that, in the future, LLMs won’t be used as part of other systems that add some of those properties. But that is not what we have today, or what can be seen in the foreseeable near future.

> Those are not properties that LLMs possess, regardless of how useful their output can be.

What makes you say that? Other than the usual "I'm a human, and humans must be very special, so when something that's not a human does X, it's either not real X, or X wasn't important in the first place".

It highlights, in my eyes, that "critical flaws" of LLMs are the same exact flaws that humans routinely suffer from. Sometimes LLMs have it worse, but sometimes they have it better too.

LLMs do improve release to release though. Humans are more of a mixed bag.

My understanding is that the quantum effects has 0 impact, see https://en.wikipedia.org/wiki/Orchestrated_objective_reducti.... It's currently fringe/unproven science.