If AMD's betting the company on their AI compute, they had best follow the advice in the article because the only way to compete with NVIDIA is to meet/exceed not just the performance but also the DevX.

These days it's for sure the dev environment that is lacking, hardware is okay (potentially great?!), software abysmal. To run a local llm in a stable manner implies using Vulkan.. any attempt at ROCm is totally hamstrung by haphazard support of hardware alongside with an online presence poisoned by people primarily discussing work-arounds rather than work when it comes to AMD as a platform. Argh.

Is there any benefit of Vulcan vs ROCm on a card where ROCm is fully supported?

A vulkan computer shader is more portable, chances are also that the tooling for it will still be supported in a few years for your GPU (which isn't a given for ROCm especially when dealing with consumer cards)

On my gfx1030 "consumer grade hardware", ROCm means using SDMA, and that is broken for my system. Forcing `HSA_ENABLE_SDMA=0` makes it "work", but also makes loading tensors to VRAM take 15x longer.

You can't have good performance without good DevX. There's a reason why we get a new python dsl for nvidia GPUs every week.