I'm not making a case for anything I'm just explaining what exists. If copying were going to be done in bulk it would have to be done asynchronously to some extent, though CPUs already work like that on a small scale due to instruction reordering.

Now it might be less necessary because CPUs are so fast with contiguous data memory that copying to other parts of memory are less of a bottleneck.