See also, psychoacoustics. The ear doesn't just do frequency decomposition. It's not clear if it even does frequency decomposition. What actually happens is lot of perceptual modelling and relative amplitude masking which makes it possible to do real-time source separation.

Which is why we can hear individual instruments in a mix.

And this ability to separate sources can be trained. Just as pitch perception can be trained, with varying results from increased acuity up to full perfect pitch.

A component near the bottom of all that is range-based perception of consonance and dissonance, based on the relationships between beat frequencies and fundamentals.

Instead of a vanilla Fourier transform, frequencies are divided into multiple critical bands (q.v.) with different properties and effects.

What's interesting is that the critical bands seem to be dynamic, so they can be tuned to some extent depending on what's being heard.

Most audio theory has a vanilla EE take on all of this, with concepts like SNR, dynamic range, and frequency resolution.

But the experience of audio is hugely more complex. The brain-ear system is an intelligent system which actively classifies, models, and predicts sounds, speech, and music as they're being heard, at various perceptual levels, all in real time.

Yes, indeed, to think about the ear as the thing that hears is already a huge error. The ear is - at best - a faulty transducer with its own unique way of turning air pressure variations into nerve impulses and what the brain does with those impulses is as much a part of hearing as the mechanics of the ear, just like a computer keyboard does not interpret your keystrokes, it just turns them into electrical signals.