For performance critical code, you wouldn't use malloc()-allocation at all, though whether using an arena allocator or putting stuff on the stack, your argument is still sane. Data locality is speed.