> WebGL worked perfectly fine without them

Except it didn't. In the GL programming model it's trivial to accidentially leak the wrong granular render state into the next draw call, unless you always reconfigure all states anyway (and in that case PSOs are strictly better, they just include too much state).

The basic idea of immutable state group objects is a good one, Vulkan 1.0 and D3D12 just went too far (while the state group granularity of D3D11 and Metal is just about right).

> Similarly, WebGPU could have done without that static binding mess.

This I agree with, pre-baked BindGroup objects were just a terrible idea right from the start, and AFAIK they are not even strictly necessary when targeting Vulkan 1.0.

There should be a better abstraction to solve the GL state leakage problem than PSOs. We end up with a combinatory explosion of PSOs when some states they abstract are essentially toggling some bits in a GPU register in no way coupled with the rest of the pipeline state.

That abstraction exists in D3D11 and to a lesser extent in Metal via smaller state-group-objects (for instance D3D11 splits the rende state into immutable objects for rasterizer-state, depth-stencil-state, blend-state and (vertex-)input-layout-state (not even needed anymore with vertex pulling).

Even if those state group objects don't match the underlying hardware directly they still reign in the combinatorial explosion dramatically and are more robust than the GL-style state soup.

AFAIK the main problem is state which needs to be compiled into the shader on some GPUs while other GPUs only have fixed-function hardware for the same state (for instance blend state).

> Except it didn't. In the GL programming model it's trivial to accidentially leak the wrong granular render state into the next draw call

This is where I think Vulkan and WebGPU are chasing the wrong goal: To make draw calls faster. What's even faster, however, is making fewer draw calls and that's something graphics devs can easily do when you provide them with tools like multi-draw. Preferably multi-draw that allows multiple different buffers. Doing so will naturally reduce costly state changes with little effort.

Agreed, this is the console approach with command buffers that get DMAed, and having more code on the GPU side.