Hacker News

I thought memcpy would have launched some sort of built-in mechanism

Where did you get this impression?

I'd expect memcpy calls to turn into builtin_memcpy and then into raw loads/stores for known small N and a call into compiler-rt for unknown or large N. If it doesn't, patches to do that for your architecture are likely appreciated.

CyberDildonics a day ago [ - ]

Calling a function with 'builtin' in the name doesn't mean it's embedded in the CPU itself to run concurrently which I think is what they thought might exist.

EGreg 3 days ago [ - ]

From my college days, which were quite long ago. And working with Win32 "BitBlt" requests to the OS, etc.

And also, it would just make sense. If copying entire blocks or memory pages, such as "BitBlt", is one command, why would I need CPU cycles to actually do it? It would seem like the lowest hanging fruit to automate in SDRAM

It just seems like the easiest example of SIMD

CyberDildonics 3 days ago [ - ]

These are contradictory things. SIMD instructions are still regular instructions, not some concurrent system for copying. When you say command, maybe you meant a windows OS function that was similar to memcpy. An OS function and individual CPU instructions are two different thing. There is something called DMA, but I don't know how much that is used for memory to memory copies.

EGreg 3 days ago [ - ]

Well CPUs already transparently handle memory paging so why not copying?

https://en.wikipedia.org/wiki/Memory_paging

CyberDildonics 3 days ago [ - ]

I'm not making a case for anything I'm just explaining what exists. If copying were going to be done in bulk it would have to be done asynchronously to some extent, though CPUs already work like that on a small scale due to instruction reordering.

Now it might be less necessary because CPUs are so fast with contiguous data memory that copying to other parts of memory are less of a bottleneck.