The controls are the only gripe I have with this amazing work.

If I was in Unity, I would address this issue by manually placing a bunch of virtual cameras in the world and using cinemachine to blend between them. The size of this world is small enough to justify manual placement and configuration of each. You could also just focus on the complex areas and let the default follow cam handle the rest.