I wasn't thinking of a 16x8 array, I was thinking of an 160x80 array. Spacing your speakers too closely will have diminishing returns, but it depends on the frequency to which you want to operate. If we assume a frequency of 20kHz you should space your speakers at half the wavelength to avoid spatial aliasing artifacts, so something like a speaker every 8mm. This is especially important if the listener position is close.

This means 13000 precise delay lines multiplied with the number of virtual sound sources you want to allow for at the same time let's assume 64. At a sampling rate of 48kHz that means 39x10⁹ Samples per second. That isn't nothing, especially for a consumer device and if we assume the delay values for each of the virtual source-speaker combinations needs to be adjusted on the fly.

Hmm, I see. I think that you can cheat quite a bit more than that, though, if your objective is only to fool human hearing (as your 48ksps and 20kHz numbers suggest): the humans can only use phase information to detect the directionality of sound up to a few hundred Hz, relying entirely on amplitude attenuation above that, presumably because their neurons run too slow. But maybe your objective is sonar beamforming or something.