> So would the model work if it only trained on the top 10% of pixels in every image? Or do they in fact need the entire image before they begin processing it, and therefore use the entire image?

The model works by training on what features humans can make sense out of the image they're presented with, if the image and the observations of the image's feature were clear/observable enough. Then the generation makes use of those observations. I'm just using 10% as an arbitrary number to describe proportions. If the generation were 100% of the observations from the same image, the model would be overfitting, and many would have deemed it to have produced a copy.

> Just because a race car uses kinetic energy, gravity, and friction to propel itself, the same way a human does, doesn't mean it's doing the same thing as a human.

WTF does this even mean? A race car uses concepts from Newton, just as how a human uses gravity to train it's muscles to move be it knowingly or unknowingly. But you don't see them (car makers/humans) paying rent to Newton after he discovered gravity. Come on!