Hacker News

>I would strongly advise against this. GPUs are highly efficient when neighboring threads within a warp access neighboring data and follow largely the same code path. Even across warps, data locality is highly desirable.

Its a bit like saying writing code at all is bad though. Divergence isn't desirable, but neither is running any code at all - sometimes you need it to solve a problem

Not supporting divergence at all is a huge mistake IMO. It isn't good, but sometimes its necessary

>Could you kindly share a source for this? Shader Execution Reordering (SER) is available for Ray tracing, but it is not a general-purpose feature that can be used in generic compute shaders.

https://docs.nvidia.com/cuda/cuda-programming-guide/03-advan...

My understanding is that this is fully transparent to the programmer, its just more advanced scheduling for threads. SER is something different entirely

Nvidia are a bit vague here, so you have to go digging into patents if you want more information on how it works