> If you're in a 5-person peer-to-peer webrtc video call where you receive 4 streams of video, you also need to send 4 streams of video. This is scalable in a sense; the uplink and downlink requirements are equal.

The issue is not with the throughput: a typical videoconference requires 700kbit/s per stream, so even 12Mbit/s upstream should be enough for 20 streams or so. The issue is with having to encode the video separately for every receiver.

WebRTC adapts to the available throughput by encoding the video separately for every receiver, with different parameters. If you're in a five-person peer-to-peer conference, you decode four videos simultaneously, which is fine, but you're also encoding your video four times, which is not.

An SFU works around the issue by not reencoding the video: the SFU merely decrypts the video and reencrypts it with the public key of every receiver. Since AES is implemented in hardware, the reencryption comes essentially for free.

(Of course, that implies that the SFU needs to use other techniques for bandwidth adaptation, such as simulcast or scalable video coding (SVC). See slides 10-12 of https://galene.org/galene-20250610.pdf if you're interested.)