WOW this is cool! I love this, but as a nitpick, how scalable is it to do each connection peer to peer? Doesn't that mean that I have to keep a stream connection open for everyone who I want to include in the room?

> how scalable is it to do each connection peer to peer?

I can tell you roughly how it works for webrtc video calls.

If you're in a 5-person peer-to-peer webrtc video call where you receive 4 streams of video, you also need to send 4 streams of video. This is scalable in a sense; the uplink and downlink requirements are equal.

The problem comes if you're in a 100-person meeting, and the application logic has hidden 95 people's video to save on bandwidth. In that case, while you'd only receive 4 streams of video you'd have to send 99.

In practice, webrtc video calling often uses an 'SFU' or 'Selective Forwarding Unit' where you send one video stream to the vendor's cloud server and they forward it to the other people in the meeting. This also benefits people on asymmetric connections, and mobile users where uploading costs battery life, and users behind highly restrictive firewalls where webrtc's NAT traversal fails to work.

> If you're in a 5-person peer-to-peer webrtc video call where you receive 4 streams of video, you also need to send 4 streams of video. This is scalable in a sense; the uplink and downlink requirements are equal.

The issue is not with the throughput: a typical videoconference requires 700kbit/s per stream, so even 12Mbit/s upstream should be enough for 20 streams or so. The issue is with having to encode the video separately for every receiver.

WebRTC adapts to the available throughput by encoding the video separately for every receiver, with different parameters. If you're in a five-person peer-to-peer conference, you decode four videos simultaneously, which is fine, but you're also encoding your video four times, which is not.

An SFU works around the issue by not reencoding the video: the SFU merely decrypts the video and reencrypts it with the public key of every receiver. Since AES is implemented in hardware, the reencryption comes essentially for free.

(Of course, that implies that the SFU needs to use other techniques for bandwidth adaptation, such as simulcast or scalable video coding (SVC). See slides 10-12 of https://galene.org/galene-20250610.pdf if you're interested.)

> This is scalable in a sense; the uplink and downlink requirements are equal.

But don't most home connections have a slower uplink than downlink? Mine certainly does.

> how scalable is it

Considering the site just spams my error console with

  DOMException: Failed to construct 'RTCPeerConnection': Cannot create so many PeerConnections
I'd say not very.

When I experimented with this a few years back a true NxN room would cap around 8 people when using PCs and 4 on mobile, the bottleneck is encoding/decoding of the video. For larger rooms you need a server to route the video to all recipients, this is called an SFU. With an SFU you can have hundreds of participants, but not everyone can speak or be seen at once.

For audio-only the sky is the limit. I used to work on a voice-based social media and you also need an SFU here as well, but I added a few mixing features so that multiple incoming audio streams would be mixed together into a single outgoing one. Was very fun (and scalable).

It's not very scalable. Regular rules of webrtc apply, so once you go to a certain number of users, you would have to use an SFU approach.