> It’s important in the fair use assessment to understand that the training itself is fair use,

I think that this is a distinction many people miss.

If you take all the works of Shakespeare, and reduce it to tokens and vectors is it Shakespeare or is it factual information about Shakespeare? It is the latter, and as much as organizations like the MLB might want to be able to copyright a fact you simply cannot do that.

Take this one step further. IF you buy the work, and vectorize it, thats fine. But if you feed it in the vectors for Harry Potter so many times that it can reproduce half of the book, it becomes a problem when it spits out that copy.

And what about all the other stuff that LLM's spit out? Who owns that. Well at present, no one. If you train a monkey or an elephant to paint, you cant copyright that work because they aren't human, and neither is an LLM.

If you use an LLM to generate your code at work, can you leave with that code when you quit? Does GPL3 or something like the Elastic Search license even apply if there is no copyright?

I suspect we're going to be talking about court cases a lot for the next few years.

Yes. Someone on this post mentioned that switzerland allows downloading copyrightable material but not distributing them.

So things get even more dark because what becomes distribution can have a really vague definition and maybe the AI companies will only follow the law just barely, just for the sake of not getting hit with a lawsuit like this again. But I wonder if all this case did was maybe compensate the authors this one time. I doubt if we can see a meaningful change towards AI companies attitude's towards fair use/ essentially exploiting authors.

I feel like that they would try to use as much legalspeak as possible to extract as much from authors (legally) without compensating them which I feel is unethical but sadly the law doesn't work on ethics.

Switzerland has five main collecting societies: ProLitteris for literature and visual arts, the SSA (Société Suisse des Auteurs) for dramatic works, the SUISA for music, Suissimage for audiovisual works, and SWISSPERFORM for related rights like those of performers and broadcasters. These non-profit societies manage copyright and related rights on behalf of their members, collecting and distributing royalties from users of their works.

Note that the law specifically regulates software differently, so what you cannot do is just willy nilly pirate games and software.

What distribution means in this case is defined in the swiss law. However swiss law as a whole is in some ways vague, to leave a lot up to interpretation by the judiciary.

> compensate the authors this one time.

I would assume it would compensate the publisher. Authors often hand ownership to the publisher; there would be obvious exceptions for authors who do well.

> And what about all the other stuff that LLM's spit out? Who owns that. Well at present, no one. If you train a monkey or an elephant to paint, you cant copyright that work because they aren't human, and neither is an LLM.

This seems too cute by half, courts are generally far more common sense than that in applying the law.

This is like saying using `rails generate model:example` results in a bunch of code that isn't yours, because the tool generated it according to your specifications.

> courts are generally far more common sense than that in applying the law.

'The Board’s decision was later upheld by the U.S. District Court for the District of Columbia, which rejected the applicant’s contention that the AI system itself should be acknowledged as the author, with any copyrights vesting in the AI’s owner. The court further held that the CO did not act arbitrarily or capriciously in denying the application, reiterating the requirement that copyright law requires human authorship and that copyright protection does not extend to works “generated by new forms of technology operating absent any guiding human hand, as plaintiff urges here.”' From: https://www.whitefordlaw.com/news-events/client-alert-can-wo...

The court is using common sense when it comes to the law. It is very explicit and always has been... That word "human" has some long standing sticky legal meaning (as opposed to things that were "property").

The example is a real legal case afaik, or perhaps paraphrased from one (don’t think it was a monkey - an ape? An elephant?).

I’d guess the legal scenario for `rails generate` is that you have a license to the template code (by way of how the tool is licensed) and the template code was written by a human so licensable by them and then minimally modified by the tool.

I think you're thinking of this case [1], it was a monkey, it wasn't a painting but a selfie. A painting would have only made the no-copyright argument stronger.

[1] https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...

I don’t think the code you get from rails generate is yours. Certainly not by way of copyright, which protects original works of authorship and so if it’s not original, it’s not copyrightable, and yes it’s been decided in US courts that non-human-authorship doesn’t count as creative.

> If you take all the works of Shakespeare, and reduce it to tokens and vectors is it Shakespeare or is it factual information about Shakespeare?

To rephrase the question:

Is a PDF of the complete works of Shakespeare Shakespeare, or is it factual information about Shakespeare?

Reencoding human-readable information into a form that's difficult for humans to read without machine assistance is nothing new.

Like most things in law, the answers are going to come down to intent and outcome. If you distribute the PDF to other people with the intent that they can read the copyrighted works of an author, then you have distributed that author's content in violation of copyright. If on the other hand, you encrypted the entire contents of that PDF, threw away the encryption key and the published prints of the PDF as artwork of binary code, that's probably going to fall on the side of "fair use" even though the entire copyrighted work is input to and contained in your final output. Though you might get into some legal hot water if you promoted your work using the author's name, but that's more of a trademark issue than a copyright issue.

> Like most things in law, the answers are going to come down to intent and outcome. If you distribute the PDF...

I wasn't talking about distribution, and neither was the person whom I was replying to. But, thanks for wasting your time on publishing the rest of your comment, I guess.

The question is going to be how much human intellectual input there was I think. I don't think it will take much - you can write the crappiest novel on earth that is complete random drivel and you still have copyright on it.

So to me, if you are doing literally any human review, edits, control over the AI then I think you'll retain copyright. There may be a risk that if somebody can show that they could produce exactly the same thing from a generic prompt with no interaction then you may be in trouble, but let's face it should you have copyright at that point?

This is, however, why I favor stopping slightly short of full agentic development at this point. I want the human watching each step and an audit trail of the human interaction in doing it. Sure I might only get to 5x development speed instead of 10x or 20x but that is already such an enormous step up from where we were a year ago that I am quite OK with that for now.

I mean, sort of. The issue is that the compression is novel. So anything post tokenization could arguably be considered value add and not necessarily derivative work.