I think CUDA and Vulkan are two completely different beasts, so I don't believe this is a good comparison. One is for GPGPU, and the other is a graphics API with compute shaders.
Also, CUDA is targeting a single vendor, whereas Vulkan is targeting as many platforms as possible.
The point still stands: Vulkan chose to go all-in on mandatory maximum complexity, instead of providing less-complex routes for the common cases. Several extensions in recent years have reduced that burden because it was recognized that this is an actual issue, and it demonstrated that less complexity would have been possible right from the start. Still a long way to go, though.