How do you prove the training data didn't contain the code?

I'd assume an LLM trained on the original would also be contaminated.