> I guess it would be best if the sender allocates its payload directly on the shared memory when it’s created.

On an SMP system yes. On a NUMA system it depends on your access patterns etc.