So like every software? Why do you think there are so many security scanners and whatnot out there?

There are millions of lines of code running on a typical box. Unless you're in embedded, you have no real idea what you're running.

...No, it's not at all "like every software".

This seems like another instance of a problem I see so, so often in regard to LLMs: people observe the fact that LLMs are fundamentally nondeterministic, in ways that are not possible to truly predict or learn in any long-term way...and they equate that, mistakenly, to the fact that humans, other software, what have you sometimes make mistakes. In ways that are generally understandable, predictable, and remediable.

Just because I don't know what's in every piece of software I'm running doesn't mean it's all equally unreliable, nor that it's unreliable in the same way that LLM output is.

That's like saying just because the weather forecast sometimes gets it wrong, meteorologists are complete bullshit and there's no use in looking at the forecast at all.

> ...No, it's not at all "like every software"

Yes, they are; through the lens the person above offered that is.

In practice, all we ever get to deal with is empirical/statistical, and the person above was making an argument where they singled out LLMs for being statistical. You may reject me taking an issue with this on principled grounds, because regular programs are just structured logic, but they cease to be just that once you actually run them. Real hardware runs them. Even fully verified, machine-checked, correctly designed/specified software, only interacting with other such software, can enter into an inconsistent state through no fault of its own. Theory stops being theory once you put it in practice. And the utmost majority of programs fail the aforementioned criteria to begin with.

> people observe the fact that LLMs are fundamentally nondeterministic

LLMs are not "non-deterministic", let alone fundamentally so. If I launch a model locally, pin the seed, and ask the exact same question 10x, I'll get the same answer every single time down to the very byte. Provided you select your hardware and inference engine correctly, the output remains reproducible even across different machines. They're not even stateful! You literally send along the entire state (context window) every single time.

Now obviously, you might instead mean a more "practical" version of this, their general semantic unpredictability. But even then, every now and then I do ask the "same" question to LLMs, and they keep giving essentially the "same" response. They're pretty darn semantically stable.

> In ways that are generally understandable, predictable, and remediable.

You could say the same thing about the issue in the OP. You have a very easy to understand issue that behaves super predictably, and will be (imo) remediable just fine by the various service providers.

Now think of all the hard to impossible to reproduce bugs people just end up working around. The never ending list of vulnerabilities and vulnerability categories. The inexplicable errors that arise due to real world hardware issues. Yes, LLMs are statistical in nature, not artisanally hardwired. But in the end, they're operated in the same empirical way, along the same lines of concerns, and with surprisingly similar outcomes and consequences at times.

You're not going to understand the millions (or really, tens or hundreds of millions) of lines of code running on a typical machine. You'll never be able to exhaustively predict their behavior (especially how they interact with terabytes of data or more over time) and the defects contained within. You'll never remediate those defects fully. Hell, even for classes of problems where such a thing would be possible to achieve structurally, people are resisting the change.

If they want to take an issue with LLMs, a plain gesturing at their statistical nature is just not particularly convincing. Not in a categorical, drop the mic way at least, that's for sure.

>That's like saying just because the weather forecast sometimes gets it wrong, meteorologists are complete bullshit and there's no use in looking at the forecast at all.

Are you really not seeing that GP is saying exactly this about LLMs?

What you want for this to be practical is verification and low enough error rate. Same as in any human-driven development process.

[dead]