Any idea how that works? Something with the microphone maybe?

Probably the same accelerometers and gyroscope used for Spatial Audio.