Seems somewhat niche due to physics. When you are ten feet away from a screen (or even three), you can scarcely distinguish between audio emanating from the upper-left “pixel/voxel” (to give a new meaning to an old word) and from the bottom-right, let alone from two adjacent locations.

I think you're trying to make an argument similar to those arguing against "retina" displays, i.e. there is some minimum perceptual angular resolution for sound, so information finer than that is pointless? I think you're either underestimating the perceptual resolution or assuming a very small screen at a large distance.

I think that kind of resolution is good enough to overlap a lot of task focused screen fields of view. I have experienced a pretty clear "central" sound stage within 30-45 degrees or so with regular stereo speakers. That field can imply a lot of subtle positioning within it, not even considering wild panning mixes. I'm talking about the kind of realistic recording where it feels like a band in front of you with different instruments near each other but not colocated, like an acoustic ensemble. Obviously you cannot shrink this down to a phone at arm's length or a small embedded control screen and still have the same amount of spatial resolution crammed into a very narrow field.

When I sit with a 14" laptop at a normal distance for typing, it is also easy to distinguish tapping sounds along the screen. I just did a blind test by closing my eyes and having my wife tap the screen with a plastic pen. The interesting question to me, though, is whether that is just perception via the binaural sense, or really incorporating some intuition about the screen's structure. It's response to a tap does clearly vary by distance from bezels and hinges...

Impressed that you took the time to run a quick test. Spatial compression is a hard problem though, the most expensive sound bars are easily beat by a cheap 2.1 setup. Phones are (mostly) still mono output even though a speaker out the top and bottom (perpendicular to the viewing angle) would be a win for watching videos in landscape, probably because the improvement wouldn’t be noticeably appreciated (enough to be economically feasible, anyway).

Interesting research all the same, of course!

Not niche at all. You could have a phone for example that plays sounds from areas of the screen where they originate. Key presses, buttons, notification pop ups, etc.

You missed (or didn't address, at any rate) my point. For a phone where all audio channels are in between both ears (or even worse, held off to the right/left of both ears) with only a minute difference in the angle of the arc to each of the binaural inputs, convince me that you can reasonably distinguish between sounds emanating from different locations (facing the same direction - not at all like a speaker pointing out the side of each phone!!) at a rate statistically distinguishable from chance.

With enough speakers coupled on the order of the wavelength of the sound (and for most frequencies, these seem like they will be), you can use beamforming to aim different sounds in different directions from a single source, with speakers facing in only one direction.

For an extreme example of this, refer to the Sphere, where they can target sounds at individual audience members from any arbitrary direction using speakers in the surround display.

If you get enough, you might be able to do some really interesting things using it as a phased array