Hacker News

Could be that the CUDA backend has seen far more specialization optimizations whereas the seeingly fairly fresh HIP backend hasn't had as many developers looking at it, in the end a few more control instructions on the CPU side to go through the ZLUDA wrapper will be insignificant compared to all the time spent inside better optimized GPU kernels.