> And by the way, training your monster on data produced in part by my own hands, without attribution or compensation.
> To the others: I apologize to the world at large for my inadvertent, naive if minor role in enabling this assault.
this is my position too, I regret every single piece of open source software I ever produced
and I will produce no more
That’s throwing the baby out with the bath water.
The Open Source movement has been a gigantic boon on the whole of computing, and it would be a terrible shame to lose that ad a knee jerk reaction to genAI
> That’s throwing the baby out with the bath water.
it's not
the parasites can't train their shitty "AI" if they don't have anything to train it on
You refusing to write open source will do nothing to slow the development of AI models - there's plenty of other training data in the world.
It will however reduce the positive impact your open source contributions have on the world to 0.
I don't understand the ethical framework for this decision at all.
> You refusing to write open source will do nothing to slow the development of AI models - there's plenty of other training data in the world.
There's also plenty of other open source contributors in the world.
> It will however reduce the positive impact your open source contributions have on the world to 0.
And it will reduce your negative impact through helping to train AI models to 0.
The value of your open source contributions to the ecosystem is roughly proportional to the value they provide to LLM makers as training data. Any argument you could make that one is negligible would also apply to the other, and vice versa.
> You refusing to write open source will do nothing to slow the development of AI models - there's plenty of other training data in the world.
if true, then the parasites can remove ALL code where the license requires attribution
oh, they won't? I wonder why
The ethical framework is simply this one: what is the worth of doing +1 to everyone, if the very thing you wish didn't exist (because you believe it is destroying the world) benefits x10 more from it?
If bringing fire to a species lights and warms them, but also gives the means and incentives to some members of this species to burn everything for good, you have every ethical freedom to ponder whether you contribute to this fire or not.
I don't think that a 10x estimate is credible. If it was I'd understand the ethical argument being made here, but I'm confident that excluding one person's open source code from training has an infinitesimally small impact on the abilities of the resulting model.
For your fire example, there's a difference between being Prometheus teaching humans to use fire compared to being a random villager who adds a twig to an existing campfire. I'd say the open source contributions example here is more the latter than the former.
The ethical issue is consent and normalisation: asking individuals to donate to a system they believe is undermining their livelihood and the commons they depend on, while the amplified value is captured somewhere else.
"It barely changes the model" is an engineering claim. It does not imply "therefore it may be taken without consent or compensation" (an ethical claim) nor "there it has no meaningful impact on the contributor or their community" (moral claim).
Your argument applies to everything that requires a mass movement to change. Why do anything about the climate? Why do anything about civil rights? Why do anything about poverty? Why try to make any change? I'm just one person. Anything I could do couldn't possibly have any effect. You know what, since all the powerful interests say it's good, it's a lot easier to jump on the bandwagon and act like it is. All of those people who disagree are just luddites anyways. And the luddites didn't even have a point right? They were just idiots who hates metallic devices for no reason at all.
> there's plenty of other training data in the world.
Not if most of it is machine generated. The machine would start eating its own shit. The nutrition it gets is from human-generated content.
> I don't understand the ethical framework for this decision at all.
The question is not one of ethics but that of incentives. People producing open source are incentivized in a certain way and it is abhorrent to them when that framework is violated. There needs to be a new license that explicitly forbids use for AI training. That may encourage folks to continue to contribute.
Saying people shouldn't create open source code because AI will learn from it, is like saying people shouldn't create art because AI will learn from it.
In both cases I get the frustration - it feels horrible to see something you created be used in a way you think is harmful and wrong! - but the world would be a worse place without art or open source.
> In both cases I get the frustration - it feels horrible to see something you created be used in a way you think is harmful and wrong! - but the world would be a worse place without art or open source.
Well maybe the AI parasites should have thought of that.
Guilt-tripping people into providing more fodder for the machine. That is really something else.
I'm not surprised that you don't understand ethics.
I'm trying to guilt-trip them into using their skills to improve the world through continuing to release open source software.
I couldn't care less if their code was used to train AI - in fact I'd rather it wasn't since they don't want it to be used for that.
given the "AI" industry's long term goals, I see contributing in any way to generative "AI" to be deeply unethical, bordering on evil
which is the exact opposite of improving the world
you can extrapolate to what I think of YOUR actions
I imagine you think I'm an accelerant of all of this, through my efforts to teach people what it can and cannot do and provide tools to help them use it.
My position on all of this is that the technology isn't going to uninvented and I very much doubt it will be legislated away, which means the best thing we can do is promote the positive uses and disincentivize the negative uses as much as possible.
I don't see you as an accelerant
they're using your exceptional reputation as a open-source developer to push their proprietary parasitic products and business models, with you thinking you're doing good
I don't mean to be rude, but I suspect "useful idiot" is probably the term they use to describe open source influencers in meetings discussing early access
You know. I'm realizing im my head Im comparing this to Nazism and Hitler. Im sure many people thought he was bringing change to the world and since its going to happen anyway we should all get on-board with it. In the end there was a reckoning.
IMHO their are going to be consequences of these negative effects, regardless of the positives.
Looking at it in this light, you might want to get out now, while you still can. Im sure its going to continue, its not going to be legislated away, but it's still wrong to be using this technology in the way it's being used right now, and I will not be associated with the harmful effects this technology is being used for because a few corporations feel justified in pushing evil on to the world wrapped positives.
Whoa, did not expect a Hitler comparison here.
I think of LLMs as more like the invention of cars or railways: enormous negative externalities, but provided enough benefit to humanity that we tend to think they were worthwhile.
Are the negatives of LLMs really that bad? Most of them look more like annoyances to me.
The ones that upset me the most are the ChatGPT psychosis episodes which have lead to loss of life. I'm reassured by the fact that the AI labs are taking genuine steps to reduce the risk of that happening, which seems analogous to me to the development of car safety features.
Your post, full of well formed, English sentences is also going to contribute to generative AI, so thanks for that.
oh I've thought of that :)
my comments on the internet are now almost exclusively anti-"AI", and anti-bigtech
[dead]
[dead]
[dead]
Yes — That’s the bath water. The baby is the all the communal good that has come from FLOSS.
OP is asserting that the danger posed by AI is far bigger than the benefit of FLOSS. So to OP AI is the bath water.
Yes, and they are okay with throwing the baby out with it, which is what the other commenter is commenting about. Throwing babies out of buckets full of bathwater is a bad thing, is what the idiom implies.
surely that cat's out of the bag by now; and it's too late to make an active difference by boycotting the production of more public(ly indexed) code?
Kind of kind of not. Form a guild and distribute via SAAS or some other undistributable knowledge. Most code out there is terrible so relying on AI trained on it will lose out.
If we end up with only proprietary software we are the one who lose
GenAI would be decades away (if not more) with only proprietary software (which would never have reached both the quality, coordination and volume open source enabled in such a relatively short time frame).
open source code is a miniscule fraction of the training data
I'd love to see a citation there. We already know from a few years ago that they were training AI based on projects on GitHub. Meanwhile, I highly doubt software firms were lining up to have their proprietary code bases ingested by AI for training purposes. Even with NDAs, we would have heard something about it.
I should have clarified what I meant. The training data includes roughly speaking the entire internet. Open source code is probably a large fraction of the code in the data, but it is a tiny fraction of the total data, which is mostly non-code.
My point was that the hypothetical of "not contributing to any open source code" to the extent that LLMs had no code to train on, would not have made as big of an impact as that person thought, since a very large majority of the internet is text, not code.
I'm sorry but your point doesn't make sense to me. Training on all the world's text but omitting code means that your machine won't know how to write code. That's an enormous impact, not a small one.
Unless you're in the camp that believes ChatGPT can extrapolate outside of its training data and do computer programming without having ever trained on any computer programming material?
fair point
Where did most of the code in their training data come from?
Free software has always been about standing on the shoulders of giants.
I see this as doing so at scale and thus giving up on its inherent value is most definitely throwing the baby out with the bathwater.
I'd rather the internet ceased to exist entirely, than contributing in any way to generative "AI"
This is just childish. This is a complex problem and requires nuance and adaptability, just as programming. Yours is literally the reaction of an angsty 12 year old.
Such a reactionary position is no better than nihilism.
If God is Dead, do we have to rebuild It in the megacorps of the world whilst maximizing shareholder value?
I think you aren't recognizing the power that comes from organizing thousands, hundreds of thousands, or millions of workers into vast industrial combines that produce the wealth of our society today. We must go through this, not against it. People will not know what could be, if they fail to see what is.
this just sounds like some memes smashed together in the LHC. what is this even supposed to mean? AI is a technology that will inevitably developed by humankind. all of this appeal to... populism? socialism?... is completely devoid of meaning in response to a discussion whose sine qua non is pragmatism at the very least.
Ridiculous overreaction.
It is. If not you, other people will write their code, maybe of worse quality, and the parasites will train on this. And you cannot forbid other people to write open source software.
> If not you, other people will write their code, maybe of worse quality, and the parasites will train on this.
this is precisely the idea
add into that the rise of vibe-coding, and that should help accelerate model collapse
everyone that cares about quality of software should immediately stop contributing to open source
Open source has been good, but I think the expanded use of highly permissive licences has completely left the door open for one sided transactions.
All the FAANGs have the ability to build all the open source tools they consume internally. Why give it to them for free and not have the expectation that they'll contribute something back?
Even the GPL allows companies to simply use code without contributing back, long as it's unmodified, or through a network boundary. the AGPL has the former issue.
This goes against what Stallman believes in, but there's a need for AGPL with a clause against closed-weight models.
At least the contribution back can happen. You're right though, it's not perfect.
How dare you chastise someone for making the personal decision not to produce free work anymore? Who do you think you are?
The promise and freedom of open source has been exploited by the least egalitarian and most capitalist forces on the planet.
I would never have imagined things turning out this way, and yet, here we are.
FLOSS is a textbook example of economic activity that generates positive externalities. Yes, those externalities are of outsized value to corporate giants, but that’s not a bad thing unto itself.
Rather, I think this is, again, a textbook example of what governments and taxation is for — tax the people taking advantage of the externalities, to pay the people producing them.
Yes, but unfortunately this never happens; and depressingly, I can't imagine it happening.
The open source movement has been exploited.
Open Source (as opposed to Free Software) was intended to be friendly to business and early FOSS fans pushed for corporate adoption for all they were worth. It's a classic "leopards ate my face" moment that somehow took a couple of decades for the punchline to land: "'I never thought capitalists would exploit MY open source,' sobs developer who advocated for the Businesses Exploiting Open Source movement."
I'm not sure I follow your line of reasoning.
The exploited are in the wrong for not recognising they're going to be exploited?
A pretty twisted point of view, in my opinion.
Perhaps you are unfamiliar with the "leopards ate my face" meme? https://knowyourmeme.com/memes/leopards-eating-peoples-faces... The parallels between the early FOSS advocates energetically seeking corporate adoption of FOSS and the meme are quite obvious.
I don't misunderstand what you're saying, but I think it's a twisted point of view.
"The power of accurate observation is commonly called cynicism by those who have not got it." - George Bernard Shaw
But it's frankly not true, despite your desire to appear sardonic
For instance, does Richard Stallman fit this mold?
RMS' vision depended on as much adoption of Free Software as possible to both starve proprietary software producers and to get contributions of copyleft source code rolling back in to the FOSS ecosystem to make it unstoppable and he knew full well that most software is produced and consumed by businesses. So, yes, RMS fits the mold; he needed capitalists to adopt Free Software wholesale but failed to foresee how one-sidedly exploitative the relationship would eventually be.
If you disagree, please explain how RMS and/or Perens do not fit the mold.
I think your analysis is too simplistic.
Stallman cared / cares mostly about user freedom, but was canny enough to understand that businesses would also been to be able to engage with this freedom too.
Copyleft licensing was put in place _because_ commercial exploitation was expected. It was designed to preserve user freedoms.
Compromise was built in the model.
But the unexpected twist was cloud; which broke the safety mechanism.
This is the reason I feel it's unfair to say that proponents of the movement are naive. Exploitation was predicted from the outset. A complex turn of events drew the shape of the current landscape.
And if blame is to be apportioned anywhere, it should be firmly at the feet of the corporations profiting.
As an aside, the first unexpected twist was actually Tivoization.
But, anyway, I do not think your analysis captures the situation in the right way. The free-rider problem, where users contribute no/negligible code or money back to FOSS, is the heart of the exploitation; Tivoization or cloud SaaS-ification are merely forms of free-riding. Other forms of the free-rider problem would have eventually become a thorn in the side of FOSS even if those two things had never happened and there is no way to plug that hole in the concept of copyleft.
And I maintain that was entirely predictable (and it was predicted by many a few decades ago!): there is no reason for a business owner to contribute back to FOSS when not contractually obligated to do so. Like the fable of the scorpion and the frog, even if it's valid to do so it's kind of pointless to blame capitalists for doing what everyone knew what they were going to do all along.
But I don't agree that the onus should be on the open source movement to accept blame.
These corporation are not run by people who have no choice, they're run by people who choose to run the system to the absolute limit for absolute material gain.
Unfortunately as I see it, even if you want to contribute to open source out of a pure passion or enjoyment, they don't respect the licenses that are consumed. And the "training" companies are not being held liable.
Are there any proposals to nail down an open source license which would explicitly exclude use with AI systems and companies?
All licenses rely on the power of copyright and what we're still figuring out is whether training is subject to the limitations of copyright or if it's permissible under fair use. If it's found to be fair use in the majority of situations, no license can be constructed that will protect you.
Even if you could construct such a license, it wouldn't be OSI open source because it would discriminate based on field of endeavor.
And it would inevitably catch benevolent behavior that is AI-related in its net. That's because these terms are ill-defined and people use them very sloppily. There is no agreed-upon definition for something like gen AI or even AI.
Even if you license it prohibiting AI use, how would you litigate against such uses? An open source project can't afford the same legal resources that AI firms have access to.
I won't speak for all but companies I've worked for large and small have always respected licenses and were always very careful when choosing open source, but I can't speak for all.
The fact that they could litigate you into oblivion doesn't make it acceptable.
Where is this spirit when AWS takes a FOSS project, puts it in the cloud and monetizes it?
It exists, hence e.g. AGPL.
But for most open source licenses, that example would be within bounds. The grandparent comment objected to not respecting the license.
The AGPL does not prevent offering the software as a service. It's got a reputation as the GPL variant for an open-core business model, but it really isn't that.
Most companies trying to sell open-source software probably lose more business if the software ends up in the Debian/Ubuntu repository (and the packaging/system integration is not completely abysmal) than when some cloud provider starts offering it as a service.
Fairly sure it's the same problem and the main reason stronger licenses are appearing or formerly OSS companies closing down their sources.
you are saying X, but a completely different group of people didn't say Y that other time! I got you!!!!
It’s fair to call out that both aspects are two sides of the same coin. I didn’t try to “get” anyone
um, no it's not. you have fallen into the classic web forum trap of analyzing a heterogenous mix of people with inconsistent views as one entity that should have consistent views
> Unfortunately as I see it, even if you want to contribute to open source out of a pure passion or enjoyment, they don't respect the licenses that are consumed.
Because it is "transformative" and therefore "fair" use.
Running things through lossy compression is transformative?
The quotation marks indicate that _I_ don't think it is. Especially given that modern deep learning is over-paramaterized to the point that it interpolates training examples.
Fair use is an exception to copyright, but a license agreement can go far beyond copyright protections. There is no fair use exception to breach of contract.
I imagine a license agreement would only apply to using the software, not merely reading the code (which is what AI training claims to do under fair use).
As an analogy, you can’t enforce a “license” that anyone that opens your GitHub repo and looks at any .cpp file owes you $1,000,000.
And then having vibe coders constantly lecture us about how the future is just prompt engineering, and that we should totally be happy to desert the skills we spent decades building (the skills that were stolen to train AI).
"The only thing that matters is the end result, it's no different than a compiler!", they say as someone with no experience dumps giant PRs of horrific vibe code for those of us that still know what we're doing to review.
If you're unhappy that bad people might use your software in unexpected ways, open source licenses were never appropriate for you in the first place.
Anyone can use your software! Some of them are very likely bad people who will misuse it to do bad things, but you don't have any control over it. Giving up control is how it works. It's how it's always worked, but often people don't understand the consequences.
>Giving up control is how it works. It's how it's always worked,
no, it hasn't. Open source software, like any open and cooperative culture, existed on a bedrock, what we used to call norms when we still had some in our societies and people acted not always but at least most of the time in good faith. Hacker culture (word's in the name of this website) which underpinned so much of it, had many unwritten rules that people respected even in companies when there were still enough people in charge who shared at least some of the values.
Now it isn't just an exception but the rule that people will use what you write in the most abhorrent, greedy and stupid ways and it does look like the only way out is some Neal Stephenson Anathem-esque digital version of a monastery.
Open source software is published to the world and used far beyond any single community where certain norms might apply.
If you care about what people do with your code, you should put it in the license. To the extent that unwritten norms exist, it's unfair to expect strangers in different parts of the world to know what they are, and it's likely unenforceable.
This recently came up for the GPLv2 license, where Linus Torvalds and the Software Freedom Conservancy disagree about how it should be interpreted, and there's apparently a judge that agrees with Linus:
https://mastodon.social/@torvalds@social.kernel.org/11577678...
Inside open source communities maybe. In the corporate world? Absolutely not. Ever. They will take your open source code and do what they want with it, always have.
This varies. The lawyers for risk-adverse companies will make sure they follow the licenses. There are auditing tools to make sure you're not pulling in code you shouldn't. An example is Google's go-licenses command [1].
But you can be sure that even the risk-adverse companies are going to go by what the license says, rather than "community norms."
Other companies are more careless.
[1] https://github.com/google/go-licenses
It’s a fair point that ai training makes enforcing licences more difficult than other situations. My point is that licence issues like this this aren’t really a technology issue it’s a company greed/legal issue because it’s always been the case.
People do not have perfect foresight, and the ways open source software is used has significantly shifted in recent years. As a result, people reevaluating whether or not they want to participate.
Yes, very true.
It's not really people, and they don't really use the software.
People training LLM's on source code is sort of like using newspaper for wrapping fish. It's not the expected use, but people are still using it for something.
As they say, "reduce, reuse, recycle." Your words are getting composted.
Nothing says reduce and reuse like building huge quantities of GPUs and massive data centers to run AI models. It’s like composting!
It's kind of ironic since AI can only grow by feeding on data and open source with its good intentions of sharing knowledge is absolutely perfect for this.
But AI is also the ultimate meat grinder, there's no yours or theirs in the final dish, it's just meat.
And open source licenses are practically unenforceable for an AI system, unless you can maybe get it to cough up verbatim code from its training data.
At the same time, we all know they're not going anywhere, they're here to stay.
I'm personally not against them, they're very useful obviously, but I do have mixed or mostly negative feelings on how they got their training data.
I learned what i learned due to all the openess in software engineering and not because everyone put it behind a pay wall.
Might be because most of us got/gets payed well enough that this philosophy works well or because our industry is so young or because people writing code share good values.
It never worried me that a corp would make money out of some code i wrote and it still doesn't. AFter all, i'm able to write code because i get paid well writing code, which i do well because of open source. Companies always benefited from open source code attributed or not.
Now i use it to write more code.
I would argue though, I'm fine with that, to push for laws forcing models to be opened up after x years, but i would just prefer the open source / open community coming together and creating just better open models overall.
I've been feeling a lot the same way, but removing your source code from the world does not feel like a constructive solution either.
Some Shareware used to be individually licensed with the name of the licensee prominently visible, so if you had got an illegal copy you'd be able to see whose licensed copy it was that had been copied.
I wonder if something based on that idea of personal responsibility for your copy could be adopted to source code. If you wanted to contribute to a piece of software, you could ask a contributor and then get a personally licensed copy of the source code with your name in every source file... but I don't know where to take it from there. Has there ever been some system similar to something like that that one could take inspiration from?
Why? The core vision of free software and many open source licenses was to empower users and developers to make things they need without being financially extorted, to avoid having users locked in to proprietary systems, to enable interoperability, and to share knowledge. GenAI permits all of this to a level beyond just providing source code.
Most objections like yours are couched in language about principles, but ultimately seem to be about ego. That's not always bad, but I'm not sure why it should be compelling compared to the public good that these systems might ultimately enable.
> and I will produce no more
Nah, don't do that. Produce shitloads of it using the very same LLM tools that ripped you off, but license it under the GPL.
If they're going to thief GPL software, least we can do is thief it back.
That's a weird position to take. Open source software is actually what is mitigating this stupidity in my opinion. Having monopolistic players like Microsoft and Google is what brought us here in the first place.
What a miserable attitude. When you put something out in the world it's out there for anyone to use and always has been before AI.
it is (... was) there to use for anyone, on the condition that the license is followed
which they don't
and no self-serving sophistry about "it's transformative fair use" counts as respecting the license
The license only has force because of copyright. For better or for worse, the courts decide what is transformative fair use.
Characterizing the discussion behind this as "sophistry" is a fundamentally unserious take.
For a serious take, I recommend reading the copyright office's 100 plus page document that they released in May. It makes it clear that there are a bunch of cases that are non-transformative, particularly when they affect the market for the original work and compete with it. But there's also clearly cases that are transformative when no such competition exists, and the training material was obtained legally.
https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
I'm not particularly sympathetic to voices on HN that attempt to remove all nuance from this discussion. It's challenging enough topic as is.
> For better or for worse, the courts decide what is transformative fair use.
thankfully, I don't live under the US regime
there is no concept of fair use in my country
OK, so what's the status in your country? What lawsuits have been filed, and what are the findings?
There's a huge political aspect here: copyright hasn't worked for decades (I've written about this at length), and this is the latest iteration in that erosion. Countries that enforce IP as a natural right are going to have trouble navigating the change: they either need to avoid AI entirely (this will have higher costs than many anticipate), or they need revise how they think about copyright. Or they can just ignore it. There are no good options.
My instinct is that countries that embrace change will do better.
> Characterizing the discussion behind this as "sophistry" is a fundamentally unserious take
What a joke. Sorry, but no. I don't think is unserious at all. What's unserious is saying this.
> and the training material was obtained legally
And assuming everyone should take it at face value. I hope you understand that going on a tech forum and telling people they aren't being nuanced because a Judge in Alabama that can barely unlock their phone weighed in on a massively novel technology with global implications, yes, reads deeply unserious. We're aware the U.S. legal system is a failure and the rest of the world suffers for it. Even your President routinely steals music for campaign events, and stole code for Truth Social. Your copyright is a joke that's only there to serve the fattest wallets.
These judges are not elected, they are appointed by people whose pockets are lined by these very corporations. They don't serve us, they are here to retrofit the law to make illegal things corporations do, legal. What you wrote is thought terminating.
What I wrote is an encouragement to investigate the actual state of the law when you're talking about legal topics. That's the opposite of thought-terminating.
*in your opinion
[flagged]
Was it ever open source if there was an implied refusal to create something you don't approve of? Was it only for certain kinds of software, certain kinds of creators? If there was some kind of implicit approval process or consent requirement, did you publish it? Where can that be reviewed?
> and I will produce no more
Thanks for your contributions so far but this won't change anything.
If you'd want to have a positive on this matter, it's better to pressure the government(s) to prevent GenAI companies from using content they don't have a license for, so they behave like any other business that came before them.
What people like Rob Pike don't understand is that the technology wouldn't be possible at all if creators needed to be compensated. Would you really choose a future where creators were compensated fairly, but ChatGPT didn't exist?
> What people like Abraham Lincoln don't understand is that the technology wouldn't be possible at all if slaves needed to be compensated. Would you really choose a future where slaves were compensated fairly, but plantations didn't exist?
I fixed it... Sorry, I had to, the quote template was simply too good.
"Too expensive to do it legally" doesn't really stand up as an argument.
Unequivocally, yes. There are plenty of "useful" things that can come out of doing unethical things, that doesn't make it okay. And, arguably, ChatGPT isn't nearly as useful as it is at convincing you it is.
Absolutely. Was this supposed to be some kind of gotcha?
> Would you really choose a future where creators were compensated fairly, but ChatGPT didn't exist?
Yes.
I don't see how "We couldn't do this cool thing if we didn't throw away ethics!" is a reasonable argument. That is a hell of a thing to write out.
Yes, very much so. I am in favour of pushing into the future as fast as we can, so to speak, but I think ChatGPT is a temporary boost that is going to slow us in the long run.
Very much yes, how can I opt into that timeline?
Yes, what a wild position to prefer the job loss, devaluation of skills, and environmental toll of AI to open source creators having been compensated in some better manner.
That would be like being able to keep my cake and eat it too. Of course I would. Surely you're being sarcastic?
Uh, yeah, he clearly would prefer it didn’t exist even if he was compensated.
Er... yes? Obviously? What are you even asking?
Yes.
Um, please let your comment be sarcastic. It is ... right?
Yes.
Yes.
Well yeah.
[dead]