You are right but it's confusing because there are two different approaches. I guess you could say both approaches improve performance by eliminating context switches and system calls.

1. Kernel bypass combined with DMA and techniques like dedicating a CPU to packet processing improve performance.

2. What I think of as "removing userspace from the data plane" improves performance for things like sendfile and ktls.

To your point, Quic in the kernel seems to not have either advantage.

So... RDMA?

No, the first technique describes the basic way they already operate, DMA, but giving access to userspace directly because it's a zerocopy buffer. This is handled by the OS.

RDMA is directly from bus-to-bus, bypassing all the software.