Hacker News

> "We still have no legal conclusion on whether AI model generated code, that is trained on all publicly available source (irrespective of type of license), is legal or not."

it depends on the country you are in

but overall in the US judges have mostly consistently ruled it as legal

and this is extremely unlikely to change/be effectively interpreted different

but where things are more complex is:

- model containing training data (instead of generic abstractions based on it), determined by weather or not it can be convinced to produce close to verbatim output of the training data the discussion is about

- model producing close to verbatim training data

the later seems to be mostly? always? be seen as copyright violation, with the issue that the person who does the violation (i.e. uses the produced output) might not known

the former could mean that not just the output but the model itself can count as a form of database containing copyright violating content. In which case they model provider has to remove it, which is technically impossible(1)... The pain point with that approach is that it will likely kill public models, while privately kept models will for every case put in a filter and _claim_ to have removed it and likely will get away with it. So while IMHO it should be a violation conceptually, it probably is better if it isn't.

But also the case the original article refers to is more about models interacting/using with code base then them being trained on.

(1): For LLMs, it is very much removable for knowledge based used by LLMs.