The issue is that a DMA setup:
A: requires the DMA system to know about each user process memory mappings (ie hardware support understanding CPU pagetables)
B: spend time going from user-kernelmode and back (we invented the entire io_uring and other mechanisms to avoid that).
To some extent I guess the IOMMU's available to modern graphics cards solve it partially but I'm not sure that it's a free lunch (ie it might be partially in driver/OS level to manage mappings for this).