It seems that the performance of memory copy depends on the architecture of the CPU and the careful combination of preferching iptions, register type, and instructions. This is what we found through thorough experiments and we published on a recent paper [1].