Hacker News

It is enforceable, I think you mean to say that it cannot be prevented since people can attempt to hide their usage? Most rules and laws are like that, you proscribe some behavior but that doesn't prevent people from doing it. Therefore you typically need to also define punishments:

> This policy is not open to discussion, any content submitted that is clearly labelled as LLM-generated (including issues, merge requests, and merge request descriptions) will be immediately closed, and any attempt to bypass this policy will result in a ban from the project.

hparadiz 4 hours ago [ - ]

What happens when the PR is clear, reasonable, short, checked by a human, and clearly fixes, implements, or otherwise improves the code base and has no alternative implementation that is reasonably different from the initially presented version?

pm215 4 hours ago [ - ]

If you're going to set a firm "no AI" policy, then my inclination would be to treat that kind of PR in the same way the US legal system does evidence obtained illegally: you say "sorry, no, we told you the rules and so you've wasted effort -- we will not take this even if it is good and perhaps the only sensible implementation". Perhaps somebody else will eventually re-implement it later without looking at the AI PR.

hparadiz 3 hours ago [ - ]

How funny would it be if the path to actually implement that thing is then cut off because of a PR that was submitted with the exact same patch. I'm honestly sitting here grinning at the absurdity demonstrated here. Some things can only be done a certain way. Especially when you're working with 3rd party libraries and APIs. The name of the function is the name of the function. There's no walking around it.

pm215 3 hours ago [ - ]

That's why I said "somebody else, without looking at it". Clean-room reimplementation, if you like. The functionality is not forever unimplementable, it is only not implementable by merging this AI-generated PR.

It's similar to how I can't implement a feature by copying-and-pasting the obvious code from some commercially licensed project. But somebody else could write basically the same thing independently without knowing about the proprietary-license code, and that would be fine.

joaohaas 3 hours ago [ - ]

It follows the same reasoning as when someone purposefully copies code from a codebase into another where the license doesn't allow. Yes it might be the only viable solution, and most likely no one will ever know you copied it, but if you get found out most maintainers will not merge your PR.

pmarreck 3 hours ago [ - ]

You not realizing how ridiculous this is, is exactly why half of all devs are about to get left behind.

Like, this should be enshrined as the quintessential “they simply, obstinately, perilously, refused to get it” moment.

Shortly, no one is going to care about anyone’s bespoke manual keyboard entry of code if it takes 10 times as long to produce the same functionality with imperceptibly less error.

pjc50 2 hours ago [ - ]

How would you tell that it's LLM-generated in that case?

If the submitter is prepared to explain the code and vouch for its quality then that might reasonably fall under "don't ask, don't tell".

However, if LLM output is either (a) uncopyrightable or (b) considered a derivative work of the source that was used to train the model, then you have a legal problem. And the legal system does care about invisible "bit colour".

hparadiz 2 hours ago [ - ]

It's (c) copyright of the operator.

For one simple reason. Intention.

Here's some code for example: https://i.imgur.com/dp0QHBp.png

Both sides written by an LLM. Both sides written based on my explicit prompts explaining exactly how I want it to behave, then testing, retesting, and generally doing all the normal software eng due diligence necessary for basic QA. Sometimes the prompts are explicitly "change this variable name" and it ends up changing 2 lines of code no different from a find/replace.

Also I'm watching it reason in real time by running terminal commands to probe runtime data and extrapolate the right code. I've already seen it fix basic bugs because an RFC wasn't adhered to perfectly. Even leaving a nice comment explaining why we're ignoring the RFC in that one spot.

Eventually these arguments are kinda exhausting. People will use it to build stuff and the stuff they build ends up retraining it so we're already hundreds of generations deep on the retraining already and talking about licenses at this point feels absurd to me.

rswail 7 minutes ago [ - ]

I think you need to read the report from the US Copyright office that specifically says that it's *not* (c) copyright of the operator.

It doesn't matter if the "change this variable name" instruction ends up with the same result as a human operator using a text editor.

There is a big difference between "change this variable name" and "refactor this code base to extract a singleton".

ralferoo 2 hours ago [ - ]

The problem is that even if the code is clear and easy to understand AND it fixes a problem, it still might not be suitable as a pull request. Perhaps it changes the code in a way that would complicate other work in progress or planned and wouldn't just be a simple merge. Perhaps it creates a vulnerability somewhere else or additional cognitive load to understand the change. Perhaps it adds a feature the project maintainer specifically doesn't want to add. Perhaps it just simply takes up too much of their time to look at.

There are plenty of good reasons why somebody might not want your PR, independent of how good or useful to you your change is.

pmarreck 2 hours ago [ - ]

This is where most reasonable people would say “OK, fine”

CLEARLY, a lot of developers are not reasonable

repelsteeltje 4 hours ago [ - ]

I think the bigger point about enforcement is not whether you're able to detect "content submitted that is clearly labelled as LLM-generated", but that banning presumes you can identify the origin. Ie.: any individual contributor must be known to have (at most) one identity.

Once identity is guaranteed, privileges basically come down to reputation — which in this case is a binary "you're okay until we detect content that is clearly labelled as LLM-generated".

[Added]

Note that identity (especially avoiding duplicate identity) is not easily solved.