I wish they would just allow us to push everything to GPU as buffer pointers, like buffer_device address extension allows you to, and then reconstruct the data to your required format via shaders.

The GPU programming seems to be both super low level, but also high level, cause textures and descriptors need these ultra specific data format's, and then the way you construct and upload those formats are very complicated and change all the time.

Is there really no way to simplify this ?

Regular vertex data was supposed to be strictly pre formatted in pipeline too, util it was not suddenly, and now we can just give the shader a `device_address`extension memory pointer and construct the data from that.

I also want what you're describing. It seems like the ideal "data-in-out" pipeline for purely compute based shaders.

I've brought it up several times when talking with folks who work down in the chip level for optimizing these operations and all I can say is, there are a lot of unforeseen complications to what we're suggesting.

It's not that we can't have a GPU that does these things, it's apparently more of a combination of previous and current architectural decisions that don't want that. For instance, an nVidia GPU is focused on providing the hardware optimizations necessary to do either LLM compute or graphics acceleration, both essentially proprietary technologies.

The proprietariness isn't why it's obtuse though, you can make a chip go super-duper fast for specific tasks, or more general for all kinds of tasks. Somewhere, folks are making a tradeoff of backwards compatibility and supporting new hardware accelerated tasks.

Neither of these are "general purpose compute and data flow" focuses. As such, you get the GPU that only sorta is configurable for what you want to do. Which in my opinion explains your "GPU programming seems to be both super low level, but also high level" comment.

That's been my experience. I still think what you're suggesting is a great idea and would make GPU's a more open compute platform for a wider variety of tasks, while also simplifying things a lot.

This is true, but what the parent comment is getting at is we really just want to be able to address graphics memory the same way it's exposed in CUDA for example. Where you can just have pointers to GPU memory in structures visible to the CPU, without this song and dance with descriptor set bindings.

If you got what you're asking for you'd presumably lose access to any fixed function hardware. RE your example, knowing the data format permits automagic hardware accelerated translations between image formats.

You're free to do what you're asking after by simply performing all operations manually in a compute shader. You can manually clip, transform, rasterize, and even sample textures. But you'll lose the implicit use of various fixed function hardware that you currently benefit from.

> If you got what you're asking for you'd presumably lose access to any fixed function hardware.

Are there any fixed functions left that aren't just being implemented by the general compute shader hardware?

I guess the ray tracing stuff would qualify, but that isn't what people are complaining about here.

Relevant: Descriptors are Hard from XDC 2025 - https://www.youtube.com/watch?v=TpwjJdkg2RE

Even on modern hardware there's still a lot of architectural differences to reconcile at the API level.

I’m not watching Rust as closely as I once did, but it seems like buffer ownership is something it should be leaning on more fully.

There’s an old concurrency pattern where a producer and consumer tag team on two sets of buffers to speed up throughput. Producer fills a buffer, transfers ownership to the consumer, and is given the previous buffer in return.

It is structurally similar to double buffered video, but for any sort of data.

It seems like Rust would be good for proving the soundness. And it should be a library now rather than a roll your own.

> There’s an old concurrency pattern where a producer and consumer tag team on two sets of buffers to speed up throughput. Producer fills a buffer, transfers ownership to the consumer, and is given the previous buffer in return.

Isn't this just called a swapchain?