A vulkan computer shader is more portable, chances are also that the tooling for it will still be supported in a few years for your GPU (which isn't a given for ROCm especially when dealing with consumer cards)
On my gfx1030 "consumer grade hardware", ROCm means using SDMA, and that is broken for my system. Forcing `HSA_ENABLE_SDMA=0` makes it "work", but also makes loading tensors to VRAM take 15x longer.
A vulkan computer shader is more portable, chances are also that the tooling for it will still be supported in a few years for your GPU (which isn't a given for ROCm especially when dealing with consumer cards)
On my gfx1030 "consumer grade hardware", ROCm means using SDMA, and that is broken for my system. Forcing `HSA_ENABLE_SDMA=0` makes it "work", but also makes loading tensors to VRAM take 15x longer.