CUDA can be complex if you want, but it offers more powerful functionality as an option that you can choose, rather than mandating maximum complexity right from the start. This is where Vulkan absolutely fails. It makes everything maximum effort, rather than making the common things easy.

I think CUDA and Vulkan are two completely different beasts, so I don't believe this is a good comparison. One is for GPGPU, and the other is a graphics API with compute shaders.

Also, CUDA is targeting a single vendor, whereas Vulkan is targeting as many platforms as possible.

The point still stands: Vulkan chose to go all-in on mandatory maximum complexity, instead of providing less-complex routes for the common cases. Several extensions in recent years have reduced that burden because it was recognized that this is an actual issue, and it demonstrated that less complexity would have been possible right from the start. Still a long way to go, though.