I'm not very well versed in macOS internals, but I was a tech lead of a Debian derivative. I also write HPC software and manage relevant infrastructure from metal to user , so I believe I know some details about processor architectures, general hardware, Linux and *NIX systems in general.
The user-visible layer of an operating system is generally one of the simpler layers when it comes to code and maintain since it's build upon abstractions. However, the libraries powering these layers, esp. math-heavy and hardware-interacting ones are much more complex due to the innate complexity of the hardware in general.
Keeping multiple copies of a library, in two different architectures (even if it only changes in bit-length), where this simple bit-change needs different implementation strategies to work correctly is a pain by itself (for more information, ask Linux Kernel devs since they're also phasing out x86).
Moreover, x86 and x86_64 is a completely different mode on the processor. On top of that, x86 only mode is called "protected mode" and x86_64 is called "long mode", and running x86 under x86_64 is a sub-mode of "long mode", and is already complex enough at silicon level.
Same complexities apply to ARM and other processor architectures. Silicon doesn't care about the ISA much.
We have seen the effort of increasing performance on superscalar, out of order processors opened a new, untapped family of side-channel/speculative attacks already. So processors are complex, software is complex, and multiple architectures on the same hardware is exponentially complex. If you want to see how the sausages made, you can also research how Windows handles backwards compatibility problem (hint: by keeping complete Windows copies under a single Windows installation in ELI5 terms).
So, the impressive thing was making these multi-arch installations running for quite some time. We need to be able let things go and open some software and hardware budget for new innovations and improvements.
Addenda: Funnily, games are one of the harder targets for multi-arch systems since they are both math-heavy and somewhat closer to the hardware than most applications and are very sensitive to architecture changes. Scientific/computational software is also another family, and this interestingly contains databases and office software. Excel also had a nasty floating point bug back in time, and 32/64 bit installations of Microsoft Office has some feature differences since the beginning.