Might be getting better but just yesterday I dabbled in Vulkan again, digging through the examples from https://github.com/SaschaWillems/Vulkan, and the complexity is pure insanity. What should be a simple malloc ends up being 40 lines of code, what should be a simple memcpy is another 30 lines of code, and what should be a single-line kernel launch is another 50 lines of bindings, layouts, pipelines, etc.
Tbf, a lot of the complexity (also in the official Khronos samples) is caused by insane C++ abstraction layers and 'helper frameworks' on top of the actual Vulkan C API.
Just directly talking to the C API in the tutorials/examples instead of custom wrapper code would be a lot more helpful since you'd don't need to sift through the custom abstraction layers (even if it would be slightly more code).
E.g. have a look at the code snippets in here and weep in despair ;)
https://docs.vulkan.org/tutorial/latest/03_Drawing_a_triangl...
Why should these things be simple? Graphics hardware varies greatly even across generations from the same vendors. Vulkan as an API is trying to offer the most functionality to as much of this hardware as possible. That means you have a lot of dials to tweak.
Trying to obfuscate all the options goes against what Vulkan was created for. Use OpenGL 4.6/WebGPU if you want simplicity.
A simple vkCreateSystemDefaultDevice() function like on Metal instead of requiring hundreds of lines of boilerplate would go a long way to make Vulkan more ergonomic without having to give up a more verbose fallback path for the handful Vulkan applications that need to pick a very specific device (and then probably pick the wrong one on exotic hardware configs).
And the rest of the API is full of similar examples of wasting developer time for the common code path.
Metal is a great example of providing both: a convenient 'beaten path' for 90% of use cases but still offering more verbose fallbacks when flexibility is needed.
Arguably, the original idea to provide a low-level explicit API also didn't quite work. Since GPU architectures are still vastly different (especially across desktop and mobile GPUs), a slightly more abstract API would be able to provide more wiggle room for drivers to implement an API feature more efficiently under the hood, and without requiring users to write different code paths for each GPU vendor.
Metal has the benefit of being developed by Apple for Apple devices. I'd imagine that constraint allows them to simplify code paths in a way Vulkan can't/won't. Again, Metal doesn't have to deal with supporting dozens of different hardware systems like Vulkan does.
Metal also works for external GPUs like NVIDIA or AMD though (not sure how much effort Apple still puts into those use cases, but Metal itself is flexible enough to deal with non-Apple GPUs).
CUDA can be complex if you want, but it offers more powerful functionality as an option that you can choose, rather than mandating maximum complexity right from the start. This is where Vulkan absolutely fails. It makes everything maximum effort, rather than making the common things easy.
I think CUDA and Vulkan are two completely different beasts, so I don't believe this is a good comparison. One is for GPGPU, and the other is a graphics API with compute shaders.
Also, CUDA is targeting a single vendor, whereas Vulkan is targeting as many platforms as possible.
The point still stands: Vulkan chose to go all-in on mandatory maximum complexity, instead of providing less-complex routes for the common cases. Several extensions in recent years have reduced that burden because it was recognized that this is an actual issue, and it demonstrated that less complexity would have been possible right from the start. Still a long way to go, though.