This is just the model converging on some kind of average found in its training data distribution. Here you can see the same concept starting from Dwayne Johnson and then converging to some kind of digital neo-expressionist doodle: https://www.reddit.com/r/ChatGPT/comments/1kbj71z/i_tried_th...

If there's a hint of sepia in the original image and the training data contains a lot of sepia images, it will certainly get reinforced in this process. And the original distracted boyfriend meme certainly has some strong sepia tones in the background. Same way that Dwayne Johnson's face looks a tad cartoonish. And in the intermediate steps they both flow towards some averaged human representation that seems pretty accurate if you consider the real world's ethnic distribution.