Bu that's not really what danlitt said, right? They did not claim that it's impossible for an LLM to generate something different, merely that it's not a clean room implementation since the LLM, one must assume, is trained on the code it's re-implementing.
BUt LLM has seen millions (?) of other code-bases too. If you give it a functional spec it has no reason to prefer any one of those code-bases in particular. Except perhaps if it has seen the original spec (if such can be read from public sources) associated with the old implementation, and the new spec is a copy of the old spec.
Yes if you are solving the exact problem that the original code solved and that original code was labeled as solving that exact problem then that’s very good reason for the LLM to produce that code.
Researchers have shown that an LLM was able to reproduce the verbatim text of the first 4 Harry Potter books with 96% accuracy.
> that an LLM was able to reproduce the verbatim text of the first 4 Harry Potter books with 96% accuracy.
Kinda weird argument, in their research (https://forum.gnoppix.org/t/researchers-extract-up-to-96-of-...) LLM was explicitly asked to reproduce the book. There are people that can do so without LLMs out there, by this logic everything they write is a copyright infringement an every book they can reproduce.
> Yes if you are solving the exact problem that the original code solved and that original code was labeled as solving that exact problem then that’s very good reason for the LLM to produce that code.
I think you're overestimating LLM ability to generalize.
I guess the text of Harry Potter was used as training material as one big chunk. That would be a copyright violation.
This is not an argument against coding in a different language, though. It would be like having it restate Harry Potter in a different language with different main character names, and reshuffled plot points.
Well, if you’re coding it in Zig, and it’s barely seen any Zig, then how exactly would that argument hold up in that case?