Is the protocol inherently inferior in situations like that, or is this because we've spent decades optimizing for TCP and building into kernels and hardware? If we imagine a future where QUIC gets that kind of support, will it still be a downgrade?

There is no performance disadvantage at the normal speed of most implementations. With a good QUIC implementation and a good network stack you can drive ~100 Gb/s per core on a regular processor from userspace with 1500-byte MTU with no segmentation offload if you use a unencrypted QUIC configuration. If you use encryption, then you will bottleneck on the encryption/decryption bandwidth of ~20-50 Gb/s depending on your processor.

On the Linux kernel [1], for some benchmark they average ~24 Gb/s for unencrypted TCP from kernel space with 1500-byte MTU using segmentation offload. For encrypted transport, they average ~11 Gb/s. Even using 9000-byte MTU for unencrypted TCP they only average ~39 Gb/s. So there is no inherent disadvantage when considering implementations of this performance level.

And yes, that is a link to a Linux kernel QUIC vs Linux kernel TCP comparison. And yes, the Linux kernel QUIC implementation is only driving ~5 Gb/s which is 20x slower than what I stated is possible for a QUIC implementation above. Every QUIC implementation in the wild is dreadfully slow compared to what you could actually achieve with a proper implementation.

Theoretically, there is a small fundamental advantage to TCP due to not having multiple streams which could allow it maybe a ~2x performance advantage when comparing perfectly optimal implementations. But, you are comparing a per-core control plane throughput using 1500-byte MTU of, by my estimation, ~300 Gb/s on QUIC vs ~600 Gb/s on TCP at which point both are probably bottlenecking on your per-core memory bandwidth anyways.

[1] https://lwn.net/ml/all/cover.1751743914.git.lucien.xin@gmail...