It doesn't even matter if the LLM was exposed during training. A clean-room rewrite can be done by having one LLM create a highly detailed analysis of the target (reverse engineering if it's in binary form), and providing that analysis to another LLM to base an implementation.
It doesn't matter for the LLM writing the analysis.
It does matter for the one who implements it.
Finding an LLM that's good enough to do the rewrite while being able to prove it wasn't exposed to the original GPL code is probably impossible.
Why does it need 2 LLMs? LLMs aren't people. I'm not even sure that it needs to be done in 2 seperate contexts
It doesn't have to be 2 LLMs, but nowadays there's LLM auto-memory, which means it could be argued that the same LLM doing both analysis and reimplementation isn't "clean". And the entire purpose behind the "clean" is to avoid that argument.
Agreed. But even then I don't see the problem. Multiple LLMs could work on the same project.