These are contradictory things. SIMD instructions are still regular instructions, not some concurrent system for copying. When you say command, maybe you meant a windows OS function that was similar to memcpy. An OS function and individual CPU instructions are two different thing. There is something called DMA, but I don't know how much that is used for memory to memory copies.

Well CPUs already transparently handle memory paging so why not copying?

https://en.wikipedia.org/wiki/Memory_paging

I'm not making a case for anything I'm just explaining what exists. If copying were going to be done in bulk it would have to be done asynchronously to some extent, though CPUs already work like that on a small scale due to instruction reordering.

Now it might be less necessary because CPUs are so fast with contiguous data memory that copying to other parts of memory are less of a bottleneck.