I was pointing out one screenshot from twitter isn’t proof of anything just to be clear; it’s a silly way to make a point.
yes AI makes leaking keys on GH more prevalent, but so what? it’s the same problem as before with roughly the same solution
I’m saying neural networks being probabilistic doesn’t matter — everything is probabilistic. you can still practically use the tools to great effect, just like we use everything else that has underlying probabilities
OpenAI did not have to describe it as sycophancy, they chose to, and I’d contend it was a stupid choice
and yes, you can explain what went wrong just like you can with CPUs. we don’t (usually) talk about quantum-level physics when discussing CPUs; talking about neurons in LLMs is the wrong level of abstraction
> I was pointing out one screenshot from twitter isn’t proof of anything just to be clear; it’s a silly way to make a point.
Verses your anecdote being a proof of what? Skill issue for vibe coders? Someone else prompting it wrong?
You do realize you are proving my entire point?
> yes AI makes leaking keys on GH more prevalent, but so what? it’s the same problem as before with roughly the same solution
Again, it exacerbates my point such that it makes the existing issue even worse. Additionally, that wasn't even the only point I made on the subject.
> I’m saying neural networks being probabilistic doesn’t matter — everything is probabilistic.
When you scale neural networks to become say, production-grade LLMs, then it does matter. Just like it does matter for CPUs to be reliable when you scale them in production-grade data centers.
But your earlier (fallacious) comparison ignores the reliability differences between them (CPUs vs LLMs.) and determinism is a hard requirement for that; which the latter, LLMs are not.
> OpenAI did not have to describe it as sycophancy, they chose to, and I’d contend it was a stupid choice
For the press, they had to, but no-one knows the real reason, because it is unexplainable; going back to my other point on reliability.
> and yes, you can explain what went wrong just like you can with CPUs. we don’t (usually) talk about quantum-level physics when discussing CPUs; talking about neurons in LLMs is the wrong level of abstraction
It is indeed wrong for LLMs because not even the researchers can practically give an explanation why a single neuron (for every neuron in the network) gives different values on every fine-tune or training run. Even if it is "good enough", it can still go wrong at the inference-level for other unexplainable reasons other than it "overfitted".
CPUs on the other hand, have formal verification methods which verify that the CPU conforms to its specification and we can trust that it works as intended and can diagnose the problem accurately without going into atomic-level details.
…what is your point exactly (and concisely)? I’m saying it doesn’t matter it’s probabilistic, everything is, the tech is still useful
No one is arguing that it isn't useful. The problem is this:
> I’m saying it doesn’t matter it’s probabilistic, everything is,
Maybe it doesn't matter for you, but it generally does matter.
The risk level of a technology failing is far higher if it is more random and unexplainable than if it is expected, verified and explainable. The former eliminates many serious use-cases.
This is why your CPU, or GPU works.
LLMs are neither deterministic, no formal verification exists and are fundamentally black-boxes.
That is why many vibe-coders reported many "AI deleted their entire home folder" issues even when they told it to move a file / folder to another location.
If it did not matter, why do you need sandboxes for the agents in the first place?
I think we agree then? the tech is useful; you need systems around them (like sandboxes and commit hooks that prevent leaking secrets) to use them effectively (along with learned skills)
very little software (or hardware) used in production is formally verified. tons of non-deterministic software (including neural networks) are operating in production just fine, including in heavily regulated sectors (banking, health care)
> I think we agree then? the tech is useful; you need systems around them (like sandboxes and commit hooks that prevent leaking secrets) to use them effectively (along with learned skills)
No.
> very little software (or hardware) used in production is formally verified. tons of non-deterministic software (including neural networks) are operating in production just fine, including in heavily regulated sectors (banking, health care)
It's what happens when it all goes wrong.
You have to explain exactly why, a system failed in heavily regulated sectors.
Saying 'everything is probabilistic' as the reason for the cause of an issue, is a non answer if you are a chip designer, air traffic controller, investment banker or medical doctor.
So your point does not follow.
that’s not what I said. you honestly seem like you just want to argue about stuff (e.g. not elaborating on the “no” when I basically repeated and agreed with what you said). and you seem to consistently miss my point (in the second part of your response; I’m saying these non-deterministic neural networks are already widespread in industry with these regulations, and it’s fine. they can be explained despite your repeated assertions they cannot be. also the entire point on CPUs which you may have noticed I dropped from my responses because you seemed distracted arguing about it). this is not productive and we’re both clearly stubborn, glhf
> that’s not what I said. you honestly seem like you just want to argue about stuff (e.g. not elaborating on the “no” when I basically repeated and agreed with what you said). and you seem to consistently miss my point
I have repeated myself many times and you decide to continue to ignore the reliability points that inherently impede LLMs in many use-cases which exclude them in areas where predictability in critical systems is required in production.
Vibe coders can use them, but the gulf between useful for prototyping and useful for production is riddled with hard obstacles as such a software like LLMs are fundamentally unpredictable hence the risks are far greater.
> I’m saying these non-deterministic neural networks are already widespread in industry with these regulations, and it’s fine.
So when a neural network scales beyond hundreds of layers and billions of parameters, equivalent to a production-grade LLM, explain exactly how is such a black-box on that scale explainable when it messes up and goes wrong?
> they can be explained despite your repeated assertions they cannot be.
With what methods exactly?
Early on, I said formal verification and testing on CPUs for explaining when they go wrong at scale. It is you that provided absolutely nothing of your own assertions with the equivalent for LLMs other than "they can be explained" without providing any evidence.
> also the entire point on CPUs which you may have noticed I dropped from my responses because you seemed distracted arguing about it). this is not productive and we’re both clearly stubborn, glhf
You did not make any point with that as it was a false equivalence, and I explained why the reliability of a CPU isn't the same as the reliability of a LLM.