I don't know about podcasts and stuff, but for music, you can already OBS this puzzle out real quick and use the natural song bars as synchronization steps.
1. One musician plays simple bars, repetitive stuff, and streams it.
2. Second musician receives the audio from musician 1, records a multi audio track video of himself alone (in one track) and the musician 1 output in another.
3. Stack undefinitely.
You play to what you hear, in real time. All tracks are recorded separately in separate computers and can be edited together quite easily.
Plus, this is already how most jams work in real life.
> "now" isn't a single universal instant, it's a moving target
Rhythm is already a moving target, a delay cycle. Musicians just need to know the beat 1 for each bar (which they should already know, as it is their job).
Yes, but musician 1 can't possibly react to musician 2's output meaningfully, because it happens after musician 2 listened to musician 1 and played its part. That's not how jams with musicians physically in the same room work.
Fair enough, you couldn't have something like the vocalist cueing the bassist and the drummer picking it up out of thing air and doing an improvised roll, like it happens here:
https://youtu.be/eg_rgm9VDAw?t=1597
The drummer takes a little bit more than a second to react to it (higher than a lot of stream delays by the way, but I can see how the stacking could mess it up).
That is, however, a bunch of experienced jazz musicians improvising at a high level. In most jams, these conversations happen very often on the next bar (1 and 2 and 3 and 4 and you react on the next 1).
You can see a drummer using the end of a bar to cue the flourish used on the organ in this part, for example:
https://youtu.be/jhicDUgXyNg&t=587s
It takes multiple seconds for the organist to recognize the cue, that is actually for the next bar, then he joins in. This sort of stuff is actually doable just with just video chat and OBS.
Please also note that the product's example workflow is actually worse in that "reaction jammyness" regard than what I proposed:
> The performer receives the track early, and waits the rest of the delay period to play it
This is designed for recording. It sounds more like a studio arrangement in which you have to record your part than a jam session.
> The fidelity of the live stream isn't high enough to record
Seems like an incomplete product. OBS can already record multi-tracks and monitor tracks, which you can leverage to produce high quality artifact recordings. I use to sync them manually using a DAW, but with all those auto-timers, it's a surprise it doesn't do it automatically.
> This sort of stuff is actually doable just with just video chat and OBS.
If what each person is hearing is 100-400ms delayed from what each person is producing, how can they possibly mutually react or even get their music in time? If B plays in time with what they hear from C, C hears what B did 200-800ms later - that's far too much and will sound terrible.
Jamming would seem to require incredibly low latency audio just for the rhythm to work between two performers.
I just showed you, with examples. Musicians reacts to musical structure, which can be very loose compared to what engineers think of latency. A 12-bar blues can give lots of free time to improvise without feedback.
Also, the stacked delay is part of their product. My solution just does it for free, but it's the same idea.