In music, simple panning works okay, but never exceeds the stereo base of a speaker arrangement. For truly immersive listener experience, audio engineers always employ timing differences and separate spectral treaments of stereo channels, HRTF being the cutting edge of that.
I believe Atmos as used in cinema rooms, is as far as I know amplitude based (VBAP probably), and it is impressive and immersive. Immersion depends more on the number and placement of loudspeakers. Some systems do use Ambisonics, which can encode time differences as well, at least from microphone recordings.
HRTF as used in binaural synthesis is for headphones only, not relevant here.