> I’d like to know the memory profile of this. The bottleneck is obviously sort which buffers everything in memory.

That's not obvious to me. I checked the manuals for sort(1) in GNU and FreeBSD, and neither of them buffer everything in memory by default. Instead they read chunks to an in-memory buffer, sort each chunk, and (if there are multiple chunks) use the filesystem as temporary storage for an external mergesort.

This sorting program was originally developed with memory-starved computers in mind, and the legacy shows.