Yes, but musician 1 can't possibly react to musician 2's output meaningfully, because it happens after musician 2 listened to musician 1 and played its part. That's not how jams with musicians physically in the same room work.

Fair enough, you couldn't have something like the vocalist cueing the bassist and the drummer picking it up out of thing air and doing an improvised roll, like it happens here:

https://youtu.be/eg_rgm9VDAw?t=1597

The drummer takes a little bit more than a second to react to it (higher than a lot of stream delays by the way, but I can see how the stacking could mess it up).

That is, however, a bunch of experienced jazz musicians improvising at a high level. In most jams, these conversations happen very often on the next bar (1 and 2 and 3 and 4 and you react on the next 1).

You can see a drummer using the end of a bar to cue the flourish used on the organ in this part, for example:

https://youtu.be/jhicDUgXyNg&t=587s

It takes multiple seconds for the organist to recognize the cue, that is actually for the next bar, then he joins in. This sort of stuff is actually doable just with just video chat and OBS.

Please also note that the product's example workflow is actually worse in that "reaction jammyness" regard than what I proposed:

> The performer receives the track early, and waits the rest of the delay period to play it

This is designed for recording. It sounds more like a studio arrangement in which you have to record your part than a jam session.

> The fidelity of the live stream isn't high enough to record

Seems like an incomplete product. OBS can already record multi-tracks and monitor tracks, which you can leverage to produce high quality artifact recordings. I use to sync them manually using a DAW, but with all those auto-timers, it's a surprise it doesn't do it automatically.

> This sort of stuff is actually doable just with just video chat and OBS.

If what each person is hearing is 100-400ms delayed from what each person is producing, how can they possibly mutually react or even get their music in time? If B plays in time with what they hear from C, C hears what B did 200-800ms later - that's far too much and will sound terrible.

Jamming would seem to require incredibly low latency audio just for the rhythm to work between two performers.

I just showed you, with examples. Musicians reacts to musical structure, which can be very loose compared to what engineers think of latency. A 12-bar blues can give lots of free time to improvise without feedback.

Also, the stacked delay is part of their product. My solution just does it for free, but it's the same idea.