How do you prove the training data didn't contain the code?
I'd assume an LLM trained on the original would also be contaminated.
How do you prove the training data didn't contain the code?
I'd assume an LLM trained on the original would also be contaminated.