You missed (or didn't address, at any rate) my point. For a phone where all audio channels are in between both ears (or even worse, held off to the right/left of both ears) with only a minute difference in the angle of the arc to each of the binaural inputs, convince me that you can reasonably distinguish between sounds emanating from different locations (facing the same direction - not at all like a speaker pointing out the side of each phone!!) at a rate statistically distinguishable from chance.
With enough speakers coupled on the order of the wavelength of the sound (and for most frequencies, these seem like they will be), you can use beamforming to aim different sounds in different directions from a single source, with speakers facing in only one direction.
For an extreme example of this, refer to the Sphere, where they can target sounds at individual audience members from any arbitrary direction using speakers in the surround display.