This is really interesting – AU's notion of having separate input and output "elements" (buses, more or less) is one of the worst parts of the whole API.
I understand why historically these design decisions were made, but it's not like they really enable any functionality that the other APIs don't. It's just that since the host can call `Render` more than once per sample block (ideally the host would call it once per sample block per element, but there's nothing saying the host can't call it redundantly), there's additional bookkeeping that plugins have to do surrounding the `AudioTimeStamp`. And for what? There's nothing AU can do that the other formats can't.
If a plugin has multiple fully independent buses, the model mostly works, but if there's any interdependence then things get even more complicated. Say you have two separate stereo elements that don't have any routing between them, but there's MIDI input or sample-accurate parameter changes for the given block. Now you have to keep those events around until the host has moved on to the next block, which means the plugin has to maintain even more internal buffering. This sort of lifetime ambiguity was one of the worst parts of VST2. In VST2, how long does MIDI input last? "Until the next call to `process()`." In AUv2, how long do MIDI input data or parameter change events last? "All of the render calls of the next timestamp, but then not the timestamp after that." Great, thanks.
Modern plugins, upon receiving a `Render` for a new timestamp, will just render all of their elements at the same time, but they'll internally buffer all the outputs and then just copy the buffers out per-element-render call. So, it reduces down to the same thing that other APIs do, just with more pageantry.
And yet, plugin instances having to manage their own input connection types is somehow even worse. Again, I understand why it was done this way – allowing plugins to "pull" their own inputs lets an audio graph basically run itself with very little effort from the host – it can just call `Render` on the final node of the chain, and all of the inputs come along for free.
It's a compelling value proposition, but unfortunately it fully prevents any sort of graph-level multithreading. Modern hosts do all sorts of graph theory to determine which parts of the graph can be executed in parallel, and this means that the host has to be in charge of determining which plugins to render and when. Even Logic does this now. The "pull model" of an AU instance directly calling `Render` on its input connections is relic of the past.
Anyway. VST3, CLAP, even VST2 support multiple input and output buses (hell, one of my plugins has multiple output buses for "main out", "only transients", and "everything other than a transient") – it's just a question of host support and how they're broken out. Ironically, Logic is one of the clunkiest implementations of multi-out I've seen (Bitwig is far and away the best).