I agree with you in that I think there's a better programming model out there. But using a buffer in a CUDA kernel is the simple case. Descriptors exist to bind general purpose work to fixed function hardware. It's much more complicated when we start talking about texture sampling. CUDA isn't exactly great here either. Kernel launches are more heavyweight than calling draw precisely because they're deferring some things like validation to the call site. Making descriptors explicit is verbose and annoying but it makes resource switching more front of mind, which for workloads primarily using those fixed function resources is a big concern. The ultimate solution here is bindless, but that obviously presents it's own problems for having a nice general purpose API since you need to know all your resources up front. I do think CUDA is probably ideal for many users but there are trade-offs here still.