I bought two 3080/20gb and one of those MACHINIST X99 mainboards as well (one with two full x16 pcie slots) those boards come with a xeon cpu included (for the pcie lane support) it set me back 800 euros total (had a spare psu, ssd and mem in a drawer) and now im also happily running 80tk/s Qwen 3.6 Q8 (MTP).
Good call, I really hesitated between the X570 and the X99, are you using P2P?
$ nvidia-smi topo -p2p r
GPU0 GPU1
GPU0 X CNS
GPU1 CNS X
i guess not, i use llama.cpp with:
--spec-draft-n-max 3 --spec-type draft-mtp --split-mode tensor --tensor-split 1,1
and my (gen) tk/s are between 60-80 tk/s
will test this uncensored model and ngram added as well this weekend
btw, i also set my powerlimit to 220watt per card (with nvidia-smi) that will cost you around 1 tk/s but safe you a LOT of power and heat :)
CNS means Chipset not supported and I doubt it is the case, are you sure you are using the patched nvidia module? modinfo nvidia to check which one is loaded
I'm using bazzite on my ai-rig just because it has the gpu-optimized things setup (also nvidia-open). Looking at P2P seems to be available only for 90-versions of the nvidia rtx gpu line, not 80, and some versions of 50xx? (apparently the 5080?). Anyways, i downloaded that uncensored model and tweaked those kv settings etc. still getting 60-80tk/s but im able to get my context on 180224 now, used to be 131072 which gave me some trouble, this is already a win :)