Well how did they rewrite it? If you do it in two phases, then it should be fine right?
Phase 1: extract requirements from original product (ideally not its code).
Phase 2: implement them without referencing the original product or code.
I wrote a simple "clean room" LLM pipeline, but the requirements just ended up being an exact description of the code, which defeated the purpose.
My aim was to reduce bloat, but my system had the opposite effect! Because it replicated all the incidental crap, and then added even more "enterprisey" crap on top of it.
I am not sure if it's possible to solve it with prompting. Maybe telling it to derive the functionality from the code? I haven't tried that, and not sure how well it would work.
I think this requirements phase probably cannot be automated very effectively.
How do you do phase 2 with an LLM when the LLM is likely trained on the original source code? Isn't this equivalent of "rewriting" Harry Potter by describing the plot to an LLM trained on the original books[1]?
[1] https://arstechnica.com/features/2025/06/study-metas-llama-3...
Well, check out the "clean rewrite" design document, directly: https://github.com/chardet/chardet/commit/f51f523506a73f89f0... referenced in https://github.com/chardet/chardet/issues/327#issuecomment-4...
Writing in a plan "no GPL/LGPL code" does not actually mean "forget all the GPL/LGPL code that you have ever seen, so that you start from a clean slate".
Agreed, no amount of system/user prompt directives change the fact that the LLM has already been trained on copyrighted code. It's amazing how many people fail to grasp that.
This is the "Don't think of a pink elephant" fallacy all over again.