For instance VST3 does not support many to one, and only awkwardly supports two to one -- via second-class "side-chaining" -- sound processing architectures. But the need for such possibility may not be clear to those who only know the primary one-to-one audio flow architecture ubiquitous in digital audio workstation ("DAWWWWWWW") designs. VST3 neatly fits into this architecture. In 2025 I don't think more of this is particularly innovative. In contrast, the AudioUnit spec is open ended and has an internal graph audio flow design that can readily facilitate other signal processing architectures. If you don't want to think outside the "DAW", you don't have to, but some of us musicians do.
> For instance VST3 does not support many to one, and only awkwardly supports two to one -- via second-class "side-chaining" -- sound processing architectures
This is a limitation of your host and plugins and not of VST3, plugins can declare arbitrarily many input/output busses for audio and events for many-to-many connections. It's just that in practice, hosts don't like this, and JUCE has a busted interface for it.
This is really interesting – AU's notion of having separate input and output "elements" (buses, more or less) is one of the worst parts of the whole API.
I understand why historically these design decisions were made, but it's not like they really enable any functionality that the other APIs don't. It's just that since the host can call `Render` more than once per sample block (ideally the host would call it once per sample block per element, but there's nothing saying the host can't call it redundantly), there's additional bookkeeping that plugins have to do surrounding the `AudioTimeStamp`. And for what? There's nothing AU can do that the other formats can't.
If a plugin has multiple fully independent buses, the model mostly works, but if there's any interdependence then things get even more complicated. Say you have two separate stereo elements that don't have any routing between them, but there's MIDI input or sample-accurate parameter changes for the given block. Now you have to keep those events around until the host has moved on to the next block, which means the plugin has to maintain even more internal buffering. This sort of lifetime ambiguity was one of the worst parts of VST2. In VST2, how long does MIDI input last? "Until the next call to `process()`." In AUv2, how long do MIDI input data or parameter change events last? "All of the render calls of the next timestamp, but then not the timestamp after that." Great, thanks.
Modern plugins, upon receiving a `Render` for a new timestamp, will just render all of their elements at the same time, but they'll internally buffer all the outputs and then just copy the buffers out per-element-render call. So, it reduces down to the same thing that other APIs do, just with more pageantry.
And yet, plugin instances having to manage their own input connection types is somehow even worse. Again, I understand why it was done this way – allowing plugins to "pull" their own inputs lets an audio graph basically run itself with very little effort from the host – it can just call `Render` on the final node of the chain, and all of the inputs come along for free.
It's a compelling value proposition, but unfortunately it fully prevents any sort of graph-level multithreading. Modern hosts do all sorts of graph theory to determine which parts of the graph can be executed in parallel, and this means that the host has to be in charge of determining which plugins to render and when. Even Logic does this now. The "pull model" of an AU instance directly calling `Render` on its input connections is relic of the past.
Anyway. VST3, CLAP, even VST2 support multiple input and output buses (hell, one of my plugins has multiple output buses for "main out", "only transients", and "everything other than a transient") – it's just a question of host support and how they're broken out. Ironically, Logic is one of the clunkiest implementations of multi-out I've seen (Bitwig is far and away the best).