Unfortunately, you're usually working against physics and not software, because, as you allude to, playing music together requires near perfect latency (some sources say 10ms as an absolute maximum) that's physically impossible to achieve over a long distance, even if you had a perfect connection.
Yeah right? But there must be some clever tricks, like a 1 bar buffer and only sending timing info and control data over the network mmmmm
Perhaps you're being coy. But, I'm pretty sure people do this. The performers can collaborate in "real time" (still offset from each other in real life) and the other participants (dancers and listeners) only hear finished music at the same time as all the other participants.
The latency is in the audience/performer energy.