It's not just the GPU memory, it's also I/O memory. That speeds up a lot: just update the pointer to where the memory is, no copying out of I/O memory.