That abstraction exists in D3D11 and to a lesser extent in Metal via smaller state-group-objects (for instance D3D11 splits the rende state into immutable objects for rasterizer-state, depth-stencil-state, blend-state and (vertex-)input-layout-state (not even needed anymore with vertex pulling).

Even if those state group objects don't match the underlying hardware directly they still reign in the combinatorial explosion dramatically and are more robust than the GL-style state soup.

AFAIK the main problem is state which needs to be compiled into the shader on some GPUs while other GPUs only have fixed-function hardware for the same state (for instance blend state).