I have been using Qwen3.5-9B-UD-Q4_K_XL.gguf on an 8GB 3070Ti with llama.cpp server and I get 50-60 tok/s.