I'd expect memcpy calls to turn into builtin_memcpy and then into raw loads/stores for known small N and a call into compiler-rt for unknown or large N. If it doesn't, patches to do that for your architecture are likely appreciated.
I'd expect memcpy calls to turn into builtin_memcpy and then into raw loads/stores for known small N and a call into compiler-rt for unknown or large N. If it doesn't, patches to do that for your architecture are likely appreciated.
Calling a function with 'builtin' in the name doesn't mean it's embedded in the CPU itself to run concurrently which I think is what they thought might exist.