If AMD's betting the company on their AI compute, they had best follow the advice in the article because the only way to compete with NVIDIA is to meet/exceed not just the performance but also the DevX.
If AMD's betting the company on their AI compute, they had best follow the advice in the article because the only way to compete with NVIDIA is to meet/exceed not just the performance but also the DevX.
These days it's for sure the dev environment that is lacking, hardware is okay (potentially great?!), software abysmal. To run a local llm in a stable manner implies using Vulkan.. any attempt at ROCm is totally hamstrung by haphazard support of hardware alongside with an online presence poisoned by people primarily discussing work-arounds rather than work when it comes to AMD as a platform. Argh.
Is there any benefit of Vulcan vs ROCm on a card where ROCm is fully supported?
A vulkan computer shader is more portable, chances are also that the tooling for it will still be supported in a few years for your GPU (which isn't a given for ROCm especially when dealing with consumer cards)
On my gfx1030 "consumer grade hardware", ROCm means using SDMA, and that is broken for my system. Forcing `HSA_ENABLE_SDMA=0` makes it "work", but also makes loading tensors to VRAM take 15x longer.
You can't have good performance without good DevX. There's a reason why we get a new python dsl for nvidia GPUs every week.