The image examples from the paper are absolutely insane.
Is this just extremely overfitted?
Is there a way for us to test this? Or even if the model isn't open source, I'd pay $1 to upload a capture from my wifi card on my linux box and upload it to the researchers and have them generate a picture and see if it's accurate
That is not how it works. The images of the room are included in the generative training model. The wifi is "just" helping identify the locations of objects in the room.
If you uploaded a random room to the model without retraining it, you wouldn't get anything as accurate as the images in the paper.
Yeah this seems too insane to be true. I understand that wifi signal strength etc. is heavily impacted by the contents of a room, but even so it seems farfetched that there is enough information in its distortion to lead to these results.
A lot of wifi sensing results that have high-dimensional outputs are usually using wideband links... your average wifi connection uses 20MHz of bandwidth and is transmitting on 48 spaced out frequencies. In the paper, we use 160MHz with effectively 1992 input data points. This still isn't enough to predict a 3x512x512 image well enough, which motivated predicting 4x64x64 latent embeddings instead.
The more space you take up in the frequency domain, the higher your resolution in the time domain is. Wifi sensing results that detect heart rate or breathing, for example, use even larger bandwidth, to the point where it'd be more accurate to call them radars than wifi access points.