“We developed a deep neural network that maps the phase and amplitude of WiFi signals to UV coordinates within 24 human regions. The results of the study reveal that our model can estimate the dense pose of multiple subjects, with comparable performance to image-based approaches, by utilizing WiFi signals as the only input.”
I'm an EE, and I have serious doubt about this actually working nearly as good as they are putting it. This sort of stuff is hard, even with purpose built radar systems. I'm working with angle estimation in Multipath environments, and that shit fucks your signals up. This may work it you have extremely precisely characterised the target room and walls, and a ton of stuff around it, and then don't change anything but the motion of the people. But that's not practical.
You are correct, at best this requires some a priori knowledge of the room. You can kind of do basic motion detection blindly though. They are just measuring the channel response via the 802.11 preambles, so for basic presence detection knowing that the channel response is changing is enough.
I was under the impression these experiments required a pre mapped room with EM readings. I don't think they can watch you like if it was an X-ray but I'd believe it if they could track blobs of moving mass.
There are already smart light bulbs you can buy off the shelf that use radio signals to see when somebody is in the room. Then it can turn on the lights automatically, without a camera or infrared sensor in the area.