but what if they were also scraping, for example, Netflix content to use as part of their training set?
There were some tweets the other day about how Midjourney could be prompted almost-exactly reproduce some frames of the film Dune. It wouldn't be shocking if these companies were using large databases of movies, with questionable legal status.
I see this a lot, and they very well may be. But, watch any behind the scenes documentary about any artsy movie and 9 out of 10, the director's will be waxing poetic about their inspirations, often include older movies or paintings which have uncannily similar scenes/frames. So it also wouldn't be shocking if a model trained on the same inspirations as the filmakers generates almost-exact frames as the movie makers.