Even with the stack the memory can fragment. Just consider one created 10 features on the stack and the last completed last. Then memory for the first 9 will not be released until the last completes.
This problem does not happen with a custom allocator where things to allocate are of roughly the same size and allocator uses same-sized cells to allocate.
Indeed, arena allocators are quite fast and allow you to really lock down the amount of memory that is in use for a particular kind of data. My own approach in the embedded world has always been to simply pre-allocate all of my data structures. If it boots it will run. Dynamic allocation of any kind is always going to have edge cases that will cause run-time issues. Much better to know that you have a deterministic system.