Hacker News

If the card supports vulkan and the model has gguf weights. llamacpp has excellent vulkan support that is being actively developed and is not that far behind CUDA where speed is concerned.

* https://github.com/ggml-org/llama.cpp/releases