Hacker News

If you're on Strix Halo, check out Donato's prebuilt toolboxes for ROCm with RADV or Vulkan:

https://github.com/kyuz0/amd-strix-halo-toolboxes

It takes all the work out of it, you just start llama-server in the container context and you're off doing inference without having to figure out dependencies.