Hacker News

cybertim 3 hours ago [ - ]

I bought two 3080/20gb and one of those MACHINIST X99 mainboards as well (one with two full x16 pcie slots) those boards come with a xeon cpu included (for the pcie lane support) it set me back 800 euros total (had a spare psu, ssd and mem in a drawer) and now im also happily running 80tk/s Qwen 3.6 Q8 (MTP).

iMil 2 hours ago [ - ]

Good call, I really hesitated between the X570 and the X99, are you using P2P?

cybertim 2 hours ago [ - ]

$ nvidia-smi topo -p2p r

GPU0 GPU1

GPU0 X CNS

GPU1 CNS X

i guess not, i use llama.cpp with:

--spec-draft-n-max 3 --spec-type draft-mtp --split-mode tensor --tensor-split 1,1

and my (gen) tk/s are between 60-80 tk/s

will test this uncensored model and ngram added as well this weekend

btw, i also set my powerlimit to 220watt per card (with nvidia-smi) that will cost you around 1 tk/s but safe you a LOT of power and heat :)

iMil 2 hours ago [ - ]

CNS means Chipset not supported and I doubt it is the case, are you sure you are using the patched nvidia module? modinfo nvidia to check which one is loaded

cybertim an hour ago [ - ]

I'm using bazzite on my ai-rig just because it has the gpu-optimized things setup (also nvidia-open). Looking at P2P seems to be available only for 90-versions of the nvidia rtx gpu line, not 80, and some versions of 50xx? (apparently the 5080?). Anyways, i downloaded that uncensored model and tweaked those kv settings etc. still getting 60-80tk/s but im able to get my context on 180224 now, used to be 131072 which gave me some trouble, this is already a win :)