Agree with this. Strange to me to frame the "training recall" as cheating (33 of the 38 cheating instances). Most people think of "cheating" as breaking rules. How is the LLM model supposed to not use what was put into the weights?
Agree with this. Strange to me to frame the "training recall" as cheating (33 of the 38 cheating instances). Most people think of "cheating" as breaking rules. How is the LLM model supposed to not use what was put into the weights?
While I probably wouldn't classify it as cheating, it is an even bigger signal of concern for model quality.
Cheating by breaking the rules at least implies some learned patterns.
Repeating training data verbatim for narrow cases like this implies that the model is overfitting.
If we're evaluating a person, rote recall is not necessarily cheating. It's expected, but then you'd expect them to apply that rote-memorized information in a novel way later on and prove they understand how they applied their priors to the new situation.
Models don't actually reason in the same sense, so recalling rote from their training data is "cheating" in the sense that the training data cheated, not the model. So many of those benches have snaked their way into training data to make them less useful benchmarks. That, I think, is going to be a long-term difficulty in quantitatively assessing model quality and "intelligence." So it is cheating, in a sense of what we expect from the models and training data, but not in a human sense.
Memoization is NOT problem solving ability and many people care about the latter.
By writing a not-identical, but valid, solution? Any modestly complex engineering problem has many solutions.
This is an obvious example of why LLM training is so different than human learning.
I mean people expect a model to give a working solution. They also expect it to provide it in as few tokens as possible (input/output). They might expect it to come up with an original solution, but I don't think most people would compromise on the first two points.
I expect any well-informed corporate lawyer that has thought about this carefully is strongly advising that these tools not be used. When the LLM [0] barfs up some nontrivial code that's covered by the AGPL and your company's devs put it into the company's "all rights reserved" codebase -entirely unaware of its provenance- it's going to be a nightmare to come back from that.
[0] ...that Nvidia's CEO says they should be spending 50% of a senior dev's salary per seat per year on...
The ship sailed on this a long time ago.
Oh definitely not. We're not yet solidly out of the "extremely exuberant hype" phase, so the folks that matter tend to not ask questions that dampen the mood.
Sorry to tell you friend, but LLMs have touched the vast majority of active codebases out there, whether you like it or not. You can tell yourself that you’re one of “the folks that matter” (lol) all you want, but we’re never going back.
That's what people told Ignaz Semmelweis, too, I assume. "Nothing you can do, the powers that be decided, you are a minority, you don't matter, lol!" Snickering in the shadow of what they won't confront at those who do.
Not a great analogy. A better analogy is to longbows and muskets/rifles. Longbows in the hands of a skilled user were much better weapons than early muskets, but muskets brought consistency, a lower skill floor and reduced ammunition cost. Fast forward a few hundred years and the modern incarnations of muskets make longbows look silly, and nobody would ever argue that you should go to war with longbows.
This isn't about "AI", this is about theft and abuse, and snickering under the thumb of a bully at those who call them out.
Rape was probably also "normal" for most of our history, now it's not. Early people who criticized it were probably told "what u gonna do?", too.
You don’t even know what we’re talking about in this thread, do you?
We’re talking about whether corporations are going to risk using LLMs in their codebase because of the theoretical legal risk that they might produce something that would fall under open source licenses, and be difficult to untangle later.
Regardless of what you think the morality is here, or what the legal situation turns out to be, this is already happening. The vast majority of corporate codebases are already “infected” by LLM outputs. Even at corporations where that’s not allowed, I promise there are devs using LLMs anyway.
Why repeat what you already said with more words, as if I can't read, only to leave out the bit that I responded to?
> we’re never going back.
As a prediction, this is worthless. If everybody thinks as you do, we won't, if nobody does, we will. So yes, this is purely about morality.
It's not just about collective agreement, there's a prisoner's dilemma in there.
If some segment of engineers uses agents and outperforms engineers who don't use agents, market forces will push all other engineers to use it over time. The only way we're going back is if we get concrete evidence that engineers using agents perform worse than engineers that don't, and that evidence isn't invalidated by improved models.
Well, perhaps we will be sent similarly to asylums for "anti-AI psychosis"
lol, yes, that’s a perfect analogy for whether corporations are going to use LLMs in their codebases.
> You can tell yourself that you’re one of “the folks that matter” (lol)...
kek. I'm a frequent commenter on HN. I'm definitely not one of the folks that matter.
> ...LLMs have touched the vast majority of active codebases out there...
I agree that LLM use is widespread. I disagree that LLMs have "touched the vast majority of active codebases".
Regardless, the courts are slow and Open Source licensevio cases are even slower. You seem like you'd be unaware of how terrified so many businesses are of having AGPL code deployed in their systems. In my professional experience, a great many businesses will refuse to deploy systems that contain AGPL-licensed utilities... even if those utilities are only used for internal housekeeping purposes, and whose only remote communications method is a UNIX socket used for communications with a CLI control utility that can only be used when you're SSHed into the system. If they're aware of any AGPL'd code anywhere, they will not touch it.
No amount of LLM-provider-provided indemnification can save you from license obligations you've become bound to by creating and distributing a derivative work. People who are in the know know that these tools occasionally regurgitate nontrivial portions of their input data, verbatim. Such people also know that AGPL-licensed code is absolutely in their input data. I'd wager that getting a nontrivial amount of *GPL'd code plopped into your company's "all-rights-reserved" codebase by one of these tools is more likely than the typical US driver personally being in a nontrivial automobile collision.
In the US, people go their entire lives without getting in nontrivial automobile collisions, but they usually wear their seatbelts... even prior to widely-deployed surveillance cameras. I wonder why. It seems like awful lot of boring, repetitive work for a thing that's really never going to happen to you in your lifetime.
[dead]