I do not agree with your interpretation of copyright law. It does ban copies: there has to be information flow from the original to the copy for it to be a "copy." Spontaneous generation of the same content is often taken by the courts to be a sign that it's purely functional, derived from requirements by mathematical laws.
Patent law is different and doesn't rely on information flow in the same way.
Derivative works can also run afoul of copyright. An LLM trained on a corpus of copyrighted code is creating derivative works no matter how obscure the process is.
This actually isn't what legal precedent currently says. The precedent is currently looking at actual output, not models being tainted. If you think this is morally wrong, look into getting the laws changed (serious).
What about a human trained on having 30 years of experience working with copyrighted codebases?
Said human would likely not be able to create a clean-room implementation of any of the codebases they worked on.
Judge Alsup -- U.S. District Judge William Alsup said Anthropic made "fair use" of books, deeming it "exceedingly transformative."
"Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different"
I disagree that information flow is required. Do you have a reference for that? Certainly it is an important consideration. But consider all the real literary works contained in the infinite library of babel.[1] Are they original works just because no copy was used to produce them?
[1]: https://libraryofbabel.info/
Yes; the works are original.
However, describing the path you need to get there requires copyright infringement.
Well discovery might be a fun exercise to see if the code is in the dataset of the llm.
if?