That's why he is saying it's not equivalent. For it to be the same, the LLM would have to train on/transform Minecraft's source code into its weights, then you prompt the LLM to make a game using the specifications of Minecraft solely through prompts. Of course it's copyright infringement if you just give a tool Minecraft's source code and tell it to copy it, just like it would be copyright infringement if you used a copier to copy Minecraft's source code into a new document and say you recreated Minecraft.
What if Copilot was already trained with Minecraft code in the dataset? Should be possible to test by telling the model to continue a snippet from the leaked code, the same way a news website proved their articles were used for training.
I feel as though the fact that you are asking a valid question shows how transformative it is; clearly, while the LLM gets a general ability to code from its training corpus, the data gets so transformed that it's difficult to tell what exactly it was trained on except a large body of code.
Then the training itself is the legal question. This doesn't seem all that complicated to me.
Is there a legal distinction between training, post-training, fine tuning and filling up a context window?
In all of these cases an AI model is taking a copyrighted source, reading it, jumbling the bytes and storing it in its memory as vectors.
Later a query reads these vectors and outputs them in a form which may or may not be similar to the original.
Judges have previously ruled that training counts as sufficiently transformative to qualify for fair use: https://www.whitecase.com/insight-alert/two-california-distr...
I don't know of any rulings on the context window, but it's certainly possible judges would rule that would not qualify as transformative.
It's not equivalent, but it's close enough that you can't easily dismiss it.