> I'm confused about how it gets things like the floor colour and clothing colour correct.
The model was trained on the room.
It would produce images of the room even without any WiFi data input at all.
The WiFi is used as a modulator on the input to the pre trained model.
It’s not actually generating an image of the room from only WiFi signals.