Not sure you really need huggingface-cli to download anything if you're just using llama.cpp. You can pass `-hf ...` and it will download the models for you. Set `LLAMA_CACHE` to change where the downloads go:
LLAMA_CACHE="models" ./llama-server \
-hf unsloth/gemma-4-31B-it-GGUF:UD-Q4_K_XL \
...
Yes.
-hfd for the draft model.
Nice, was wondering if there was a flag for the draft as well.
Not knocking huggingface-cli, just find it's much easier for people to try out this stuff when they can just