I'm confused about how it gets things like the floor colour and clothing colour correct.

It seems like they might be giving it more information besides the WiFi data, or else maybe training it on photos of the actual person in the actual room, in which case it's not obvious how well it would generalise.

> I'm confused about how it gets things like the floor colour and clothing colour correct.

The model was trained on the room.

It would produce images of the room even without any WiFi data input at all.

The WiFi is used as a modulator on the input to the pre trained model.

It’s not actually generating an image of the room from only WiFi signals.

This is what GP eludes to, the original dataset has many similar reference images (i.e. the common mode is the same), and the LatentCSI model is tasked to reconstruct the correct specific instance (or a similarly plausible image in case of the test/validation set)

It wouldn't generalize at all. The Wi-Fi is just differentiating among a small set of possible object placement/orientations within that fixed space, then modifying photos taken appropriately, as far as I can tell.

[dead]